You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mahout.apache.org by gs...@apache.org on 2009/11/23 16:14:38 UTC
svn commit: r883365 [9/47] - in /lucene/mahout/trunk: ./ examples/ matrix/ matrix/src/ matrix/src/main/ matrix/src/main/java/ matrix/src/main/java/org/ matrix/src/main/java/org/apache/ matrix/src/main/java/org/apache/mahout/ matrix/src/main/java/org/ap...

Added: lucene/mahout/trunk/matrix/src/main/java/org/apache/mahout/jet/stat/quantile/QuantileFinderFactory.java
URL: http://svn.apache.org/viewvc/lucene/mahout/trunk/matrix/src/main/java/org/apache/mahout/jet/stat/quantile/QuantileFinderFactory.java?rev=883365&view=auto
==============================================================================
--- lucene/mahout/trunk/matrix/src/main/java/org/apache/mahout/jet/stat/quantile/QuantileFinderFactory.java (added)
+++ lucene/mahout/trunk/matrix/src/main/java/org/apache/mahout/jet/stat/quantile/QuantileFinderFactory.java Mon Nov 23 15:14:26 2009
@@ -0,0 +1,575 @@
+/*
+Copyright ï¿½ 1999 CERN - European Organization for Nuclear Research.
+Permission to use, copy, modify, distribute and sell this software and its documentation for any purpose 
+is hereby granted without fee, provided that the above copyright notice appear in all copies and 
+that both that copyright notice and this permission notice appear in supporting documentation. 
+CERN makes no representations about the suitability of this software for any purpose. 
+It is provided "as is" without expressed or implied warranty.
+*/
+package org.apache.mahout.jet.stat.quantile;
+
+//import cern.it.util.Doubles;
+import org.apache.mahout.jet.math.Arithmetic;
+import org.apache.mahout.jet.random.engine.RandomEngine;
+/**
+ * Factory constructing exact and approximate quantile finders for both known and unknown <tt>N</tt>.
+ * Also see {@link hep.aida.bin.QuantileBin1D}, demonstrating how this package can be used.
+ *
+ * The approx. algorithms compute approximate quantiles of large data sequences in a single pass.
+ * The approximation guarantees are explicit, and apply for arbitrary value distributions and arrival distributions of the data sequence.
+ * The main memory requirements are smaller than for any other known technique by an order of magnitude.
+ *
+ * <p>The approx. algorithms are primarily intended to help applications scale.
+ * When faced with a large data sequences, traditional methods either need very large memories or time consuming disk based sorting.
+ * In constrast, the approx. algorithms can deal with > 10^10 values without disk based sorting.
+ *
+ * <p>All classes can be seen from various angles, for example as
+ * <dt>1. Algorithm to compute quantiles.
+ * <dt>2. 1-dim-equi-depth histogram.
+ * <dt>3. 1-dim-histogram arbitrarily rebinnable in real-time.
+ * <dt>4. A space efficient MultiSet data structure using lossy compression.
+ * <dt>5. A space efficient value preserving bin of a 2-dim or d-dim histogram.
+ * <dt>(All subject to an accuracy specified by the user.)
+ * 
+ * <p>Use methods <tt>newXXX(...)</tt> to get new instances of one of the following quantile finders.
+ *
+ * <p><b>1. Exact quantile finding algorithm for known and unknown <tt>N</tt> requiring large main memory.</b></p>
+ * The folkore algorithm: Keeps all elements in main memory, sorts the list, then picks the quantiles.
+ *
+ * 
+ *
+ * 
+ * <p><p><b>2. Approximate quantile finding algorithm for known <tt>N</tt> requiring only one pass and little main memory.</b></p>
+ *
+ * <p>Needs as input the following parameters:<p>
+ * <dt>1. <tt>N</tt> - the number of values of the data sequence over which quantiles are to be determined.
+ * <dt>2. <tt>quantiles</tt> - the number of quantiles to be computed. If unknown in advance, set this number large, e.g. <tt>quantiles &gt;= 10000</tt>.
+ * <dt>3. <tt>epsilon</tt> - the allowed approximation error on quantiles. The approximation guarantee of this algorithm is explicit.
+ *
+ * <p>It is also possible to couple the approximation algorithm with random sampling to further reduce memory requirements. 
+ * With sampling, the approximation guarantees are explicit but probabilistic, i.e. they apply with respect to a (user controlled) confidence parameter "delta".
+ *
+ * <dt>4. <tt>delta</tt> - the probability allowed that the approximation error fails to be smaller than epsilon. Set <tt>delta</tt> to zero for explicit non probabilistic guarantees.
+ * 
+ * <p>After Gurmeet Singh Manku, Sridhar Rajagopalan and Bruce G. Lindsay, 
+ * Approximate Medians and other Quantiles in One Pass and with Limited Memory,
+ * Proc. of the 1998 ACM SIGMOD Int. Conf. on Management of Data,
+ * Paper available <A HREF="http://www-cad.eecs.berkeley.edu/~manku/papers/quantiles.ps.gz"> here</A>.
+ *
+ * 
+ *
+ *
+ * <p><p><b>3. Approximate quantile finding algorithm for unknown <tt>N</tt> requiring only one pass and little main memory.</b></p>
+ * This algorithm requires at most two times the memory of a corresponding approx. quantile finder knowing <tt>N</tt>.
+ *
+ * <p>Needs as input the following parameters:<p>
+ * <dt>2. <tt>quantiles</tt> - the number of quantiles to be computed. If unknown in advance, set this number large, e.g. <tt>quantiles &gt;= 1000</tt>.
+ * <dt>2. <tt>epsilon</tt> - the allowed approximation error on quantiles. The approximation guarantee of this algorithm is explicit.
+ *
+ * <p>It is also possible to couple the approximation algorithm with random sampling to further reduce memory requirements. 
+ * With sampling, the approximation guarantees are explicit but probabilistic, i.e. they apply with respect to a (user controlled) confidence parameter "delta".
+ *
+ * <dt>3. <tt>delta</tt> - the probability allowed that the approximation error fails to be smaller than epsilon. Set <tt>delta</tt> to zero for explicit non probabilistic guarantees.
+ * 
+ * <p>After Gurmeet Singh Manku, Sridhar Rajagopalan and Bruce G. Lindsay,
+ * Random Sampling Techniques for Space Efficient Online Computation of Order Statistics of Large Datasets.
+ * Proc. of the 1999 ACM SIGMOD Int. Conf. on Management of Data,
+ * Paper available <A HREF="http://www-cad.eecs.berkeley.edu/~manku/papers/unknown.ps.gz"> here</A>.
+ *
+ * <p><b>Example usage:</b>
+ * 
+ *<pre>
+ * _TODO_
+ *</pre><p>
+ *
+ *
+ * @author wolfgang.hoschek@cern.ch
+ * @version 1.0, 09/24/99
+ * @see KnownDoubleQuantileEstimator
+ * @see UnknownDoubleQuantileEstimator
+ */ 
+/** 
+ * @deprecated until unit tests are in place.  Until this time, this class/interface is unsupported.
+ */
+@Deprecated
+public class QuantileFinderFactory extends Object {
+/**
+ * Make this class non instantiable. Let still allow others to inherit.
+ */
+protected QuantileFinderFactory() {
+}
+/**
+ * Computes the number of buffers and number of values per buffer such that
+ * quantiles can be determined with an approximation error no more than epsilon with a certain probability.
+ *
+ * Assumes that quantiles are to be computed over N values.
+ * The required sampling rate is computed and stored in the first element of the provided <tt>returnSamplingRate</tt> array, which, therefore must be at least of length 1.
+ *
+ * @param N the number of values over which quantiles shall be computed (e.g <tt>10^6</tt>).
+ * @param epsilon the approximation error which is guaranteed not to be exceeded (e.g. <tt>0.001</tt>) (<tt>0 &lt;= epsilon &lt;= 1</tt>). To get exact result, set <tt>epsilon=0.0</tt>;
+ * @param delta the probability that the approximation error is more than than epsilon (e.g. <tt>0.0001</tt>) (<tt>0 &lt;= delta &lt;= 1</tt>). To avoid probabilistic answers, set <tt>delta=0.0</tt>.
+ * @param quantiles the number of quantiles to be computed (e.g. <tt>100</tt>) (<tt>quantiles &gt;= 1</tt>). If unknown in advance, set this number large, e.g. <tt>quantiles &gt;= 10000</tt>.
+ * @param samplingRate output parameter, a <tt>double[1]</tt> where the sampling rate is to be filled in.
+ * @return <tt>long[2]</tt> - <tt>long[0]</tt>=the number of buffers, <tt>long[1]</tt>=the number of elements per buffer, <tt>returnSamplingRate[0]</tt>=the required sampling rate.
+ */
+public static long[] known_N_compute_B_and_K(long N, double epsilon, double delta, int quantiles, double[] returnSamplingRate) {
+	returnSamplingRate[0] = 1.0;
+	if (epsilon<=0.0) {
+		// no way around exact quantile search
+		long[] result = new long[2];
+		result[0]=1;
+		result[1]=N;
+		return result;
+	}
+	if (epsilon>=1.0 || delta>=1.0) {
+		// can make any error we wish
+		long[] result = new long[2];
+		result[0]=2;
+		result[1]=1;
+		return result;
+	}
+
+	if (delta > 0.0) {
+		return known_N_compute_B_and_K_slow(N, epsilon, delta, quantiles, returnSamplingRate);
+	}
+	return known_N_compute_B_and_K_quick(N, epsilon);
+}
+/**
+ * Computes the number of buffers and number of values per buffer such that
+ * quantiles can be determined with a <b>guaranteed</b> approximation error no more than epsilon.
+ * Assumes that quantiles are to be computed over N values.
+ * @return <tt>long[2]</tt> - <tt>long[0]</tt>=the number of buffers, <tt>long[1]</tt>=the number of elements per buffer.
+ * @param N the anticipated number of values over which quantiles shall be determined.
+ * @param epsilon the approximation error which is guaranteed not to be exceeded (e.g. <tt>0.001</tt>) (<tt>0 &lt;= epsilon &lt;= 1</tt>). To get exact result, set <tt>epsilon=0.0</tt>;
+ */
+protected static long[] known_N_compute_B_and_K_quick(long N, double epsilon) {
+	final int maxBuffers = 50;
+	final int maxHeight = 50;
+	final double N_double = (double) N;
+	final double c = N_double * epsilon * 2.0;
+	int[] heightMaximums = new int[maxBuffers-1];
+
+	// for each b, determine maximum height, i.e. the height for which x<=0 and x is a maximum
+	// with x = binomial(b+h-2, h-1) - binomial(b+h-3, h-3) + binomial(b+h-3, h-2) - N * epsilon * 2.0
+	for (int b=2; b<=maxBuffers; b++) {
+		int h = 3;
+		
+		while ( h<=maxHeight && // skip heights until x<=0
+				(h-2) * (Arithmetic.binomial(b+h-2, h-1)) -
+				(Arithmetic.binomial(b+h-3, h-3)) +
+				(Arithmetic.binomial(b+h-3, h-2)) - c
+				> 0.0
+			  ) {h++;}
+		//from now on x is monotonically growing...
+		while ( h<=maxHeight && // skip heights until x>0
+				(h-2) * (Arithmetic.binomial(b+h-2, h-1)) -
+				(Arithmetic.binomial(b+h-3, h-3)) +
+				(Arithmetic.binomial(b+h-3, h-2)) - c
+				<= 0.0
+			  ) {h++;}
+		h--; //go back to last height
+
+		// was x>0 or did we loop without finding anything?
+		int hMax;
+		if (h>=maxHeight && 
+				(h-2) * (Arithmetic.binomial(b+h-2, h-1)) -
+				(Arithmetic.binomial(b+h-3, h-3)) +
+				(Arithmetic.binomial(b+h-3, h-2)) - c
+				> 0.0) {
+			hMax=Integer.MIN_VALUE;
+		}
+		else {
+			hMax=h;
+		}
+				
+		heightMaximums[b-2]=hMax; //safe some space
+	} //end for
+
+
+	// for each b, determine the smallest k satisfying the constraints, i.e.
+	// for each b, determine kMin, with kMin = N/binomial(b+hMax-2,hMax-1)
+	long[] kMinimums = new long[maxBuffers-1];
+	for (int b=2; b<=maxBuffers; b++) {
+		int h=heightMaximums[b-2];
+		long kMin=Long.MAX_VALUE;
+		if (h>Integer.MIN_VALUE) {
+			double value = (Arithmetic.binomial(b+h-2, h-1));
+			long tmpK=(long)(Math.ceil(N_double/value));
+			if (tmpK<=Long.MAX_VALUE) {
+				kMin = tmpK;
+			}
+		}
+		kMinimums[b-2]=kMin;
+	}
+
+	// from all b's, determine b that minimizes b*kMin
+	long multMin = Long.MAX_VALUE;
+	int minB = -1;
+	for (int b=2; b<=maxBuffers; b++) {
+		if (kMinimums[b-2]<Long.MAX_VALUE) {
+			long mult = ((long)b) * ((long)kMinimums[b-2]);
+			if (mult<multMin) {
+				multMin=mult;
+				minB = b;
+			}
+		}
+	}
+
+	long b, k;
+	if (minB != -1) { // epsilon large enough?
+		b = minB;
+		k = kMinimums[minB-2];
+	}
+	else {     // epsilon is very small or zero.
+		b = 1; // the only possible solution without violating the 
+		k = N; // approximation guarantees is exact quantile search.
+	}
+
+	long[] result = new long[2];
+	result[0]=b;
+	result[1]=k;
+	return result;
+}
+/**
+ * Computes the number of buffers and number of values per buffer such that
+ * quantiles can be determined with an approximation error no more than epsilon with a certain probability.
+ * Assumes that quantiles are to be computed over N values.
+ * The required sampling rate is computed and stored in the first element of the provided <tt>returnSamplingRate</tt> array, which, therefore must be at least of length 1.
+ * @param N the anticipated number of values over which quantiles shall be computed (e.g 10^6).
+ * @param epsilon the approximation error which is guaranteed not to be exceeded (e.g. <tt>0.001</tt>) (<tt>0 &lt;= epsilon &lt;= 1</tt>). To get exact result, set <tt>epsilon=0.0</tt>;
+ * @param delta the probability that the approximation error is more than than epsilon (e.g. <tt>0.0001</tt>) (<tt>0 &lt;= delta &lt;= 1</tt>). To avoid probabilistic answers, set <tt>delta=0.0</tt>.
+ * @param quantiles the number of quantiles to be computed (e.g. <tt>100</tt>) (<tt>quantiles &gt;= 1</tt>). If unknown in advance, set this number large, e.g. <tt>quantiles &gt;= 10000</tt>.
+ * @param samplingRate a <tt>double[1]</tt> where the sampling rate is to be filled in.
+ * @return <tt>long[2]</tt> - <tt>long[0]</tt>=the number of buffers, <tt>long[1]</tt>=the number of elements per buffer, <tt>returnSamplingRate[0]</tt>=the required sampling rate.
+ */
+protected static long[] known_N_compute_B_and_K_slow(long N, double epsilon, double delta, int quantiles, double[] returnSamplingRate) {
+	final int maxBuffers = 50;
+	final int maxHeight = 50;
+	final double N_double = N;
+
+	// One possibility is to use one buffer of size N
+	//
+	long ret_b = 1;
+	long ret_k = N;
+	double sampling_rate = 1.0;
+	long memory = N;
+
+
+	// Otherwise, there are at least two buffers (b >= 2)
+	// and the height of the tree is at least three (h >= 3)
+	//
+	// We restrict the search for b and h to MAX_BINOM, a large enough value for
+	// practical values of    epsilon >= 0.001   and    delta >= 0.00001
+	//
+	final double logarithm = Math.log(2.0*quantiles/delta);
+	final double c = 2.0 * epsilon * N_double;
+	for (long b=2 ; b<maxBuffers ; b++)
+		for (long h=3 ; h<maxHeight ; h++) {
+			double binomial = Arithmetic.binomial(b+h-2, h-1);
+			long tmp = (long) Math.ceil(N_double / binomial);
+			if ((b * tmp < memory) && 
+					((h-2) * binomial - Arithmetic.binomial(b+h-3, h-3) + Arithmetic.binomial(b+h-3, h-2)
+					<= c) ) {
+				ret_k = tmp ;
+				ret_b = b ;
+				memory = ret_k * b;
+				sampling_rate = 1.0 ;
+			}
+			if (delta > 0.0) {
+				double t = (h-2) * Arithmetic.binomial(b+h-2, h-1) - Arithmetic.binomial(b+h-3, h-3) + Arithmetic.binomial(b+h-3, h-2) ;
+				double u = logarithm / epsilon ;
+				double v = Arithmetic.binomial (b+h-2, h-1) ;
+				double w = logarithm / (2.0*epsilon*epsilon) ;
+
+				// From our SIGMOD 98 paper, we have two equantions to satisfy:
+				// t  <= u * alpha/(1-alpha)^2
+				// kv >= w/(1-alpha)^2
+				//
+				// Denoting 1/(1-alpha)    by x,
+				// we see that the first inequality is equivalent to
+				// t/u <= x^2 - x
+				// which is satisfied by x >= 0.5 + 0.5 * sqrt (1 + 4t/u)
+				// Plugging in this value into second equation yields
+				// k >= wx^2/v
+
+				double x = 0.5 + 0.5 * Math.sqrt(1.0 + 4.0*t/u) ;
+				long k = (long) Math.ceil(w*x*x/v) ;
+				if (b * k < memory) {
+					ret_k = k ;
+					ret_b = b ;
+					memory = b * k ;
+					sampling_rate = N_double*2.0*epsilon*epsilon / logarithm ;
+				}
+			}
+		}
+		
+	long[] result = new long[2];
+	result[0]=ret_b;
+	result[1]=ret_k;
+	returnSamplingRate[0]=sampling_rate;
+	return result;
+}
+/**
+ * Returns a quantile finder that minimizes the amount of memory needed under the user provided constraints.
+ *  
+ * Many applications don't know in advance over how many elements quantiles are to be computed. 
+ * However, some of them can give an upper limit, which will assist the factory in choosing quantile finders with minimal memory requirements.  
+ * For example if you select values from a database and fill them into histograms, then you probably don't know how many values you will fill, but you probably do know that you will fill at most <tt>S</tt> elements, the size of your database.
+ *
+ * @param known_N specifies whether the number of elements over which quantiles are to be computed is known or not.
+ * @param N if <tt>known_N==true</tt>, the number of elements over which quantiles are to be computed.
+ *			if <tt>known_N==false</tt>, the upper limit on the number of elements over which quantiles are to be computed. 
+ * 			If such an upper limit is a-priori unknown, then set <tt>N = Long.MAX_VALUE</tt>.
+ * @param epsilon the approximation error which is guaranteed not to be exceeded (e.g. <tt>0.001</tt>) (<tt>0 &lt;= epsilon &lt;= 1</tt>). To get exact result, set <tt>epsilon=0.0</tt>;
+ * @param delta the probability that the approximation error is more than than epsilon (e.g. 0.0001) (0 &lt;= delta &lt;= 1). To avoid probabilistic answers, set <tt>delta=0.0</tt>.
+ * @param quantiles the number of quantiles to be computed (e.g. <tt>100</tt>) (<tt>quantiles &gt;= 1</tt>). If unknown in advance, set this number large, e.g. <tt>quantiles &gt;= 10000</tt>.
+ * @param generator a uniform random number generator. Set this parameter to <tt>null</tt> to use a default generator.
+ * @return the quantile finder minimizing memory requirements under the given constraints.
+ */
+public static DoubleQuantileFinder newDoubleQuantileFinder(boolean known_N, long N, double epsilon, double delta, int quantiles, RandomEngine generator) {
+	//boolean known_N = true;
+	//if (N==Long.MAX_VALUE) known_N = false;
+	// check parameters.
+	// if they are illegal, keep quite and return an exact finder.
+	if (epsilon <= 0.0 || N < 1000) return new ExactDoubleQuantileFinder();
+	if (epsilon > 1) epsilon = 1;
+	if (delta < 0) delta = 0;
+	if (delta > 1) delta = 1;
+	if (quantiles < 1) quantiles = 1;
+	if (quantiles > N) N = quantiles;
+	
+	KnownDoubleQuantileEstimator finder;
+	if (known_N) {
+		double[] samplingRate = new double[1];
+		long[] resultKnown = known_N_compute_B_and_K(N, epsilon, delta, quantiles, samplingRate);
+		long b = resultKnown[0];
+		long k = resultKnown[1];
+		if (b == 1) return new ExactDoubleQuantileFinder();
+		return new KnownDoubleQuantileEstimator((int)b,(int)k, N, samplingRate[0], generator);
+	}
+	else {
+		long[] resultUnknown = unknown_N_compute_B_and_K(epsilon, delta, quantiles);
+		long b1 = resultUnknown[0];
+		long k1 = resultUnknown[1];
+		long h1 = resultUnknown[2];
+		double preComputeEpsilon = -1.0;
+		if (resultUnknown[3] == 1) preComputeEpsilon = epsilon;
+
+		//if (N==Long.MAX_VALUE) { // no maximum N provided by user.
+		
+		// if (true) fixes bug reported by LarryPeranich@fairisaac.com
+		if (true) { // no maximum N provided by user.
+			if (b1 == 1) return new ExactDoubleQuantileFinder();
+			return new UnknownDoubleQuantileEstimator((int)b1,(int)k1, (int)h1, preComputeEpsilon, generator);
+		}
+
+		// determine whether UnknownFinder or KnownFinder with maximum N requires less memory.
+		double[] samplingRate = new double[1];
+		
+		// IMPORTANT: for known finder, switch sampling off (delta == 0) !!!
+		// with knownN-sampling we can only guarantee the errors if the input sequence has EXACTLY N elements.
+		// with knownN-no sampling we can also guarantee the errors for sequences SMALLER than N elements.
+		long[] resultKnown = known_N_compute_B_and_K(N, epsilon, 0, quantiles, samplingRate);
+		
+		long b2 = resultKnown[0];
+		long k2 = resultKnown[1];
+		
+		if (b2 * k2 < b1 * k1) { // the KnownFinder is smaller
+			if (b2 == 1) return new ExactDoubleQuantileFinder();
+			return new KnownDoubleQuantileEstimator((int)b2,(int)k2, N, samplingRate[0], generator);
+		}
+
+		// the UnknownFinder is smaller
+		if (b1 == 1) return new ExactDoubleQuantileFinder();
+		return new UnknownDoubleQuantileEstimator((int)b1,(int)k1, (int)h1, preComputeEpsilon, generator);
+	}
+}
+/**
+ * Convenience method that computes phi's for equi-depth histograms.
+ * This is simply a list of numbers with <tt>i / (double)quantiles</tt> for <tt>i={1,2,...,quantiles-1}</tt>.
+ * @return the equi-depth phi's
+ */
+public static org.apache.mahout.colt.list.DoubleArrayList newEquiDepthPhis(int quantiles) {
+	org.apache.mahout.colt.list.DoubleArrayList phis = new org.apache.mahout.colt.list.DoubleArrayList(quantiles-1);
+	for (int i=1; i<=quantiles-1; i++) phis.add(i / (double)quantiles);
+	return phis;
+}
+/**
+ * Computes the number of buffers and number of values per buffer such that
+ * quantiles can be determined with an approximation error no more than epsilon with a certain probability.
+ *
+ * @param epsilon the approximation error which is guaranteed not to be exceeded (e.g. <tt>0.001</tt>) (<tt>0 &lt;= epsilon &lt;= 1</tt>). To get exact results, set <tt>epsilon=0.0</tt>;
+ * @param delta the probability that the approximation error is more than than epsilon (e.g. <tt>0.0001</tt>) (<tt>0 &lt;= delta &lt;= 1</tt>). To get exact results, set <tt>delta=0.0</tt>.
+ * @param quantiles the number of quantiles to be computed (e.g. <tt>100</tt>) (<tt>quantiles &gt;= 1</tt>). If unknown in advance, set this number large, e.g. <tt>quantiles &gt;= 10000</tt>.
+ * @return <tt>long[4]</tt> - <tt>long[0]</tt>=the number of buffers, <tt>long[1]</tt>=the number of elements per buffer, <tt>long[2]</tt>=the tree height where sampling shall start, <tt>long[3]==1</tt> if precomputing is better, otherwise 0;
+ */
+public static long[] unknown_N_compute_B_and_K(double epsilon, double delta, int quantiles) {
+	return unknown_N_compute_B_and_K_raw(epsilon,delta,quantiles);
+	// move stuff from _raw(..) here and delete _raw(...)
+	
+	/*
+	long[] result_1 = unknown_N_compute_B_and_K_raw(epsilon,delta,quantiles);
+	long b1 = result_1[0];
+	long k1 = result_1[1];
+
+	
+	int quantilesToPrecompute = (int) Doubles.ceiling(1.0 / epsilon);
+	
+	if (quantiles>quantilesToPrecompute) {
+		// try if precomputing quantiles requires less memory.
+		long[] result_2 = unknown_N_compute_B_and_K_raw(epsilon/2.0,delta,quantilesToPrecompute);
+		
+		long b2 = result_2[0];
+		long k2 = result_2[1];
+		if (b2*k2 < b1*k1) {
+			result_2[3] = 1; //precomputation is better
+			result_1 = result_2;
+		}
+	}
+	return result_1;
+	*/
+}
+/**
+ * Computes the number of buffers and number of values per buffer such that
+ * quantiles can be determined with an approximation error no more than epsilon with a certain probability.
+ * <b>You never need to call this method.</b> It is only for curious users wanting to gain some insight into the workings of the algorithms.
+ *
+ * @param epsilon the approximation error which is guaranteed not to be exceeded (e.g. <tt>0.001</tt>) (<tt>0 &lt;= epsilon &lt;= 1</tt>). To get exact result, set <tt>epsilon=0.0</tt>;
+ * @param delta the probability that the approximation error is more than than epsilon (e.g. <tt>0.0001</tt>) (<tt>0 &lt;= delta &lt;= 1</tt>). To get exact results, set <tt>delta=0.0</tt>.
+ * @param quantiles the number of quantiles to be computed (e.g. <tt>100</tt>) (<tt>quantiles &gt;= 1</tt>). If unknown in advance, set this number large, e.g. <tt>quantiles &gt;= 10000</tt>.
+ * @return <tt>long[4]</tt> - <tt>long[0]</tt>=the number of buffers, <tt>long[1]</tt>=the number of elements per buffer, <tt>long[2]</tt>=the tree height where sampling shall start, <tt>long[3]==1</tt> if precomputing is better, otherwise 0;
+ */
+protected static long[] unknown_N_compute_B_and_K_raw(double epsilon, double delta, int quantiles) {
+	// delta can be set to zero, i.e., all quantiles should be approximate with probability 1	
+	if (epsilon<=0.0) {
+		long[] result = new long[4];
+		result[0]=1;
+		result[1]=Long.MAX_VALUE;
+		result[2]=Long.MAX_VALUE;
+		result[3]=0;
+		return result;
+	}
+	if (epsilon>=1.0 || delta>=1.0) {
+		// can make any error we wish
+		long[] result = new long[4];
+		result[0]=2;
+		result[1]=1;
+		result[2]=3;
+		result[3]=0;
+		return result;
+	}
+	if (delta<=0.0) {
+		// no way around exact quantile search
+		long[] result = new long[4];
+		result[0]=1;
+		result[1]=Long.MAX_VALUE;
+		result[2]=Long.MAX_VALUE;
+		result[3]=0;
+		return result;
+	}
+
+	int max_b = 50;
+	int max_h = 50;
+	int max_H = 50;
+	int max_Iterations = 2;
+
+	long best_b = Long.MAX_VALUE;
+	long best_k = Long.MAX_VALUE;
+	long best_h = Long.MAX_VALUE;
+	long best_memory = Long.MAX_VALUE;
+
+	double pow = Math.pow(2.0, max_H);
+	double logDelta =  Math.log(2.0/(delta/quantiles)) / (2.0*epsilon*epsilon);
+	//double logDelta =  Math.log(2.0/(quantiles*delta)) / (2.0*epsilon*epsilon);
+
+	while (best_b == Long.MAX_VALUE && max_Iterations-- > 0) { //until we find a solution
+		// identify that combination of b and h that minimizes b*k.
+		// exhaustive search.
+		for (int b=2; b<=max_b; b++) {
+			for (int h=2; h<=max_h; h++) {
+				double Ld = Arithmetic.binomial(b+h-2, h-1);
+				double Ls = Arithmetic.binomial(b+h-3,h-1);
+				
+				// now we have k>=c*(1-alpha)^-2.
+				// let's compute c.
+				//double c = Math.log(2.0/(delta/quantiles)) / (2.0*epsilon*epsilon*Math.min(Ld, 8.0*Ls/3.0));
+				double c = logDelta / Math.min(Ld, 8.0*Ls/3.0);
+
+				// now we have k>=d/alpha.
+				// let's compute d.				
+				double beta = Ld/Ls;
+				double cc = (beta-2.0)*(max_H-2.0) / (beta + pow - 2.0);
+				double d = (h+3+cc) / (2.0*epsilon);
+				
+				/*
+				double d = (Ld*(h+max_H-1.0)  +  Ls*((h+1)*pow - 2.0*(h+max_H)))   /   (Ld + Ls*(pow-2.0));
+				d = (d + 2.0) / (2.0*epsilon);
+				*/
+
+				// now we have c*(1-alpha)^-2 == d/alpha.
+				// we solve this equation for alpha yielding two solutions
+				// alpha_1,2 = (c + 2*d  +-  Sqrt(c*c + 4*c*d))/(2*d)				
+				double f = c*c + 4.0*c*d;
+				if (f<0.0) continue; // non real solution to equation
+				double root = Math.sqrt(f);
+				double alpha_one = (c + 2.0*d + root)/(2.0*d);
+				double alpha_two = (c + 2.0*d - root)/(2.0*d);
+
+				// any alpha must satisfy 0<alpha<1 to yield valid solutions
+				boolean alpha_one_OK = false;
+				boolean alpha_two_OK = false;
+				if (0.0 < alpha_one && alpha_one < 1.0) alpha_one_OK = true;
+				if (0.0 < alpha_two && alpha_two < 1.0) alpha_two_OK = true;
+				if (alpha_one_OK || alpha_two_OK) {
+					double alpha = alpha_one;
+					if (alpha_one_OK && alpha_two_OK) {
+						// take the alpha that minimizes d/alpha
+						alpha = Math.max(alpha_one, alpha_two);
+					}
+					else if (alpha_two_OK) {
+						alpha = alpha_two;
+					}
+
+					// now we have k=Ceiling(Max(d/alpha, (h+1)/(2*epsilon)))
+					long k = (long) Math.ceil(Math.max(d/alpha, (h+1)/(2.0*epsilon)));
+					if (k>0) { // valid solution?
+						long memory = b*k;
+						if (memory < best_memory) { 
+							// found a solution requiring less memory
+							best_k = k;
+							best_b = b;
+							best_h = h;
+							best_memory = memory;
+						}
+					}
+				}
+			} //end for h
+		} //end for b
+		
+		if (best_b == Long.MAX_VALUE) {
+			System.out.println("Warning: Computing b and k looks like a lot of work!");
+			// no solution found so far. very unlikely. Anyway, try again.
+			max_b *= 2;
+			max_h *= 2;
+			max_H *= 2;
+		}
+	} //end while
+
+	long[] result = new long[4];
+	result[3]=0;
+	if (best_b == Long.MAX_VALUE) {
+		// no solution found.
+		// no way around exact quantile search.
+		result[0]=1;
+		result[1]=Long.MAX_VALUE;
+		result[2]=Long.MAX_VALUE;
+	}
+	else {
+		result[0]=best_b;
+		result[1]=best_k;
+		result[2]=best_h;
+	}
+
+	return result;
+}
+}

Propchange: lucene/mahout/trunk/matrix/src/main/java/org/apache/mahout/jet/stat/quantile/QuantileFinderFactory.java
------------------------------------------------------------------------------
    svn:eol-style = native

Added: lucene/mahout/trunk/matrix/src/main/java/org/apache/mahout/jet/stat/quantile/QuantileFinderTest.java
URL: http://svn.apache.org/viewvc/lucene/mahout/trunk/matrix/src/main/java/org/apache/mahout/jet/stat/quantile/QuantileFinderTest.java?rev=883365&view=auto
==============================================================================
--- lucene/mahout/trunk/matrix/src/main/java/org/apache/mahout/jet/stat/quantile/QuantileFinderTest.java (added)
+++ lucene/mahout/trunk/matrix/src/main/java/org/apache/mahout/jet/stat/quantile/QuantileFinderTest.java Mon Nov 23 15:14:26 2009
@@ -0,0 +1,439 @@
+/*
+Copyright ï¿½ 1999 CERN - European Organization for Nuclear Research.
+Permission to use, copy, modify, distribute and sell this software and its documentation for any purpose 
+is hereby granted without fee, provided that the above copyright notice appear in all copies and 
+that both that copyright notice and this permission notice appear in supporting documentation. 
+CERN makes no representations about the suitability of this software for any purpose. 
+It is provided "as is" without expressed or implied warranty.
+*/
+package org.apache.mahout.jet.stat.quantile;
+
+import org.apache.mahout.colt.Timer;
+import org.apache.mahout.colt.list.DoubleArrayList;
+import org.apache.mahout.colt.list.IntArrayList;
+/**
+ * A class holding test cases for exact and approximate quantile finders.
+ */
+class QuantileFinderTest { 
+/**
+ * Finds the first and last indexes of a specific element within a sorted list.
+ * @return int[]
+ * @param list org.apache.mahout.colt.list.DoubleArrayList
+ * @param element the element to search for
+ */
+protected static IntArrayList binaryMultiSearch(DoubleArrayList list, double element) {
+	int index = list.binarySearch(element);
+	if (index<0) return null; //not found
+
+	int from = index-1;
+	while (from>=0 && list.get(from)==element) from--;
+	from++;
+	
+	int to = index+1;
+	while (to<list.size() && list.get(to)==element) to++;
+	to--;
+	
+	return new IntArrayList(new int[] {from,to});
+}
+/**
+ * Observed epsilon
+ */
+public static double epsilon(int size, double phi, double rank) {
+	double s = size;
+	//System.out.println("\n");
+	//System.out.println("s="+size+", rank="+rank+", phi="+phi+", eps="+Math.abs((rank)/s - phi));
+	//System.out.println("\n");
+	return Math.abs(rank/s - phi);
+}
+/**
+ * Observed epsilon
+ */
+public static double epsilon(DoubleArrayList sortedList, double phi, double element) {
+	double rank = org.apache.mahout.jet.stat.Descriptive.rankInterpolated(sortedList,element);
+	return epsilon(sortedList.size(), phi, rank);
+}
+/**
+ * Observed epsilon
+ */
+public static double epsilon(DoubleArrayList sortedList, DoubleQuantileFinder finder, double phi) {
+	double element = finder.quantileElements(new DoubleArrayList(new double[] {phi})).get(0);
+	return epsilon(sortedList, phi, element); 
+}
+public static void main(String[] args) {
+	testBestBandKCalculation(args);
+	//testQuantileCalculation(args);
+	//testCollapse();
+}
+/**
+ * This method was created in VisualAge.
+ * @return double[]
+ * @param values cern.it.hepodbms.primitivearray.DoubleArrayList
+ * @param phis double[]
+ */
+public static double observedEpsilonAtPhi(double phi, ExactDoubleQuantileFinder exactFinder, DoubleQuantileFinder approxFinder) {
+	int N = (int) exactFinder.size();
+	
+	int exactRank = (int) Utils.epsilonCeiling(phi * N) - 1;
+	//System.out.println("exactRank="+exactRank);
+	exactFinder.quantileElements(new DoubleArrayList(new double[] {phi})).get(0); // just to ensure exactFinder is sorted
+	double approxElement = approxFinder.quantileElements(new DoubleArrayList(new double[] {phi})).get(0); 
+	//System.out.println("approxElem="+approxElement);
+	IntArrayList approxRanks = binaryMultiSearch(exactFinder.buffer, approxElement);
+	int from = approxRanks.get(0);
+	int to = approxRanks.get(1);
+
+	int distance;
+	if (from<=exactRank && exactRank<=to) distance = 0;
+	else {
+		if (from>exactRank) distance=Math.abs(from-exactRank);
+		else distance=Math.abs(exactRank-to);
+	}
+
+	double epsilon = (double)distance / (double)N;
+	return epsilon;
+}
+/**
+ * This method was created in VisualAge.
+ * @return double[]
+ * @param values cern.it.hepodbms.primitivearray.DoubleArrayList
+ * @param phis double[]
+ */
+public static DoubleArrayList observedEpsilonsAtPhis(DoubleArrayList phis, ExactDoubleQuantileFinder exactFinder, DoubleQuantileFinder approxFinder, double desiredEpsilon) {
+	DoubleArrayList epsilons = new DoubleArrayList(phis.size());
+	
+	for (int i=phis.size(); --i >=0;) {
+		double epsilon = observedEpsilonAtPhi(phis.get(i), exactFinder, approxFinder);
+		epsilons.add(epsilon);
+		if (epsilon>desiredEpsilon) System.out.println("Real epsilon = "+epsilon+" is larger than desired by "+(epsilon-desiredEpsilon));
+	}
+	return epsilons;
+}
+/**
+ * Not yet commented.
+ */
+public static void test() {
+	String[] args = new String[20];
+	
+	String size="10000";
+	args[0]=size;
+	
+	//String b="5";
+	String b="12";
+	args[1]=b;
+	
+	String k="2290";
+	args[2]=k;
+
+	String enableLogging="log";
+	args[3]=enableLogging;
+
+	String chunks="10";
+	args[4]=chunks;
+
+	String computeExactQuantilesAlso="exact";
+	args[5]=computeExactQuantilesAlso;
+
+	String doShuffle="shuffle";
+	args[6]=doShuffle;
+
+	String epsilon = "0.001";
+	args[7]=epsilon;
+
+	String delta = "0.0001";
+	//String delta = "0.0001";
+	args[8]=delta;
+
+	String quantiles = "1";
+	args[9]=quantiles;
+
+	String max_N = "-1";
+	args[10]=max_N;
+
+
+	testQuantileCalculation(args);
+}
+/**
+ * This method was created in VisualAge.
+ */
+public static void testBestBandKCalculation(String[] args) {
+	//boolean known_N;
+	//if (args==null) known_N = false;
+	//else known_N = new Boolean(args[0]).booleanValue();
+
+	int[] quantiles = {100,10000};
+	//int[] quantiles = {1,100,10000};
+	
+	long[] sizes = {Long.MAX_VALUE, 1000000, 10000000, 100000000};
+	
+	double[] deltas = {0.0, 0.1, 0.00001};
+	//double[] deltas = {0.0, 0.001, 0.00001, 0.000001};
+	
+	//double[] epsilons = {0.0, 0.01, 0.001, 0.0001, 0.00001};
+	double[] epsilons = {0.0, 0.1, 0.01, 0.001, 0.0001, 0.00001, 0.000001};
+
+
+	
+	//if (! known_N) sizes = new long[] {0};
+	System.out.println("\n\n");
+	//if (known_N) 
+	//	System.out.println("Computing b's and k's for KNOWN N");
+	//else 
+	//	System.out.println("Computing b's and k's for UNKNOWN N");
+	System.out.println("mem [Math.round(elements/1000.0)]");
+	System.out.println("***********************************");
+	Timer timer = new Timer().start();
+
+	for (int q=0; q<quantiles.length; q++) {
+		int p = quantiles[q];
+		System.out.println("------------------------------");
+		System.out.println("computing for p = "+p);
+		for (int s=0; s<sizes.length; s++) {
+			long N = sizes[s];
+			System.out.println("   ------------------------------");
+			System.out.println("   computing for N = "+N);
+			for (int e=0; e<epsilons.length; e++) {
+				double epsilon = epsilons[e];
+				System.out.println("      ------------------------------");
+				System.out.println("      computing for e = "+epsilon);
+				for (int d=0; d<deltas.length; d++) {
+					double delta = deltas[d];
+					for (int knownCounter=0; knownCounter<2; knownCounter++) {
+						boolean known_N;
+						if (knownCounter ==0) known_N = true;
+						else known_N = false;
+
+						DoubleQuantileFinder finder = QuantileFinderFactory.newDoubleQuantileFinder(known_N, N, epsilon, delta, p, null);
+						//System.out.println(finder.getClass().getName());
+						/*
+						double[] returnSamplingRate = new double[1];
+						long[] result;
+						if (known_N) {
+							result = QuantileFinderFactory.known_N_compute_B_and_K(N, epsilon, delta, p, returnSamplingRate);
+						}
+						else {
+							result = QuantileFinderFactory.unknown_N_compute_B_and_K(epsilon, delta, p);
+							long b1 = result[0];
+							long k1 = result[1];
+
+							if (N>=0) {
+								long[] resultKnown = QuantileFinderFactory.known_N_compute_B_and_K(N, epsilon, delta, p, returnSamplingRate);
+								long b2 = resultKnown[0];
+								long k2 = resultKnown[1];
+				
+								if (b2 * k2 < b1 * k1) { // the KnownFinder is smaller
+									result = resultKnown;
+								}
+							}
+						}
+						
+
+						long b = result[0];
+						long k = result[1];
+						*/
+						String knownStr = known_N ? "  known" : "unknown";
+						long mem = finder.totalMemory();
+						if (mem==0) mem = N; 
+						//else if (mem==0 && !known_N && N<0) mem = Long.MAX_VALUE; // actually infinity
+						//else if (mem==0 && !known_N && N>=0) mem = N;
+						//System.out.print("         (e,d,N,p)=("+epsilon+","+delta+","+N+","+p+") --> ");
+						System.out.print("         (known, d)=("+knownStr+", "+delta+") --> ");
+						//System.out.print("(mem,b,k,memF");
+						System.out.print("(MB,mem");
+						//if (known_N) System.out.print(",sampling");
+						//System.out.print(")=("+(Math.round(b*k/1000.0))+","+b+","+k+", "+Math.round(b*k*8/1024.0/1024.0));
+						//System.out.print(")=("+b*k/1000.0+","+b+","+k+", "+b*k*8/1024.0/1024.0+", "+Math.round(b*k*8/1024.0/1024.0));
+						System.out.print(")=("+mem*8.0/1024.0/1024.0+",  "+mem/1000.0+",  "+Math.round(mem*8.0/1024.0/1024.0));
+						//if (known_N) System.out.print(","+returnSamplingRate[0]);
+						System.out.println(")");
+					}
+				}
+			}
+		}
+	}
+
+	timer.stop().display();
+}
+/**
+ * This method was created in VisualAge.
+ */
+public static void testLocalVarDeclarationSpeed(int size) {
+	System.out.println("free="+Runtime.getRuntime().freeMemory());
+	System.out.println("total="+Runtime.getRuntime().totalMemory());
+
+	/*Timer timer = new Timer().start();
+	for (int i=0; i<size; i++) {
+		for (int j=0; j<size; j++) {
+			DoubleBuffer buffer=null;
+			int val=10;
+			double f=1.0f;
+		}
+	}
+	System.out.println(timer.stop());
+	*/
+	
+	Timer timer = new Timer().start();
+	DoubleBuffer buffer;
+	int val;
+	double f;
+	int j;
+
+	for (int i=0; i<size; i++) {
+		for (j=0; j<size; j++) {
+			buffer=null;
+			val=10;
+			f=1.0f;
+		}
+	}
+	System.out.println(timer.stop());
+
+	System.out.println("free="+Runtime.getRuntime().freeMemory());
+	System.out.println("total="+Runtime.getRuntime().totalMemory());
+}
+/**
+ */
+public static void testQuantileCalculation(String[] args) {
+	int size=Integer.parseInt(args[0]);
+	int b=Integer.parseInt(args[1]);
+	int k=Integer.parseInt(args[2]);	
+	//cern.it.util.Log.enableLogging(args[3].equals("log"));
+	int chunks=Integer.parseInt(args[4]);
+	boolean computeExactQuantilesAlso=args[5].equals("exact");
+	boolean doShuffle=args[6].equals("shuffle");
+	double epsilon = new Double(args[7]).doubleValue();
+	double delta = new Double(args[8]).doubleValue();
+	int quantiles = Integer.parseInt(args[9]);	
+	long max_N = Long.parseLong(args[10]);	
+
+
+
+	System.out.println("free="+Runtime.getRuntime().freeMemory());
+	System.out.println("total="+Runtime.getRuntime().totalMemory());
+
+	double[] phis = {0.001, 0.01, 0.1, 0.5, 0.9, 0.99, 0.999, 1.0};
+	//int quantiles = phis.length;
+
+	Timer timer = new Timer();
+	Timer timer2 = new Timer();
+	DoubleQuantileFinder approxFinder;
+
+	approxFinder = QuantileFinderFactory.newDoubleQuantileFinder(false, max_N, epsilon, delta, quantiles, null);
+	System.out.println(approxFinder);
+	//new UnknownApproximateDoubleQuantileFinder(b,k);
+	//approxFinder = new ApproximateDoubleQuantileFinder(b,k);
+	/*
+	double[] returnSamplingRate = new double[1];
+	long[] result = ApproximateQuantileFinder.computeBestBandK(size*chunks, epsilon, delta, quantiles, returnSamplingRate);
+	approxFinder = new ApproximateQuantileFinder((int) result[0], (int) result[1]);
+	System.out.println("epsilon="+epsilon);
+	System.out.println("delta="+delta);
+	System.out.println("samplingRate="+returnSamplingRate[0]);
+	*/
+
+		
+	DoubleQuantileFinder exactFinder = QuantileFinderFactory.newDoubleQuantileFinder(false, -1, 0.0, delta, quantiles, null);
+	System.out.println(exactFinder);
+
+	DoubleArrayList list = new DoubleArrayList(size);
+
+	for (int chunk=0; chunk<chunks; chunk++) {
+		list.setSize(0);
+		int d = chunk*size;
+		timer2.start();
+		for (int i=0; i<size; i++) {
+			list.add((double)(i + d));
+		}
+		timer2.stop();
+		
+
+		
+		//System.out.println("unshuffled="+list);
+		if (doShuffle) {
+			Timer timer3 = new Timer().start();
+			list.shuffle();
+			System.out.println("shuffling took ");
+			timer3.stop().display();
+		}
+		//System.out.println("shuffled="+list);
+		//list.sort();
+		//System.out.println("sorted="+list);
+
+		timer.start();
+		approxFinder.addAllOf(list);
+		timer.stop();
+
+		if (computeExactQuantilesAlso) {
+			exactFinder.addAllOf(list);
+		}
+
+	}
+	System.out.println("list.add() took" + timer2);
+	System.out.println("approxFinder.add() took" + timer);
+
+	//System.out.println("free="+Runtime.getRuntime().freeMemory());
+	//System.out.println("total="+Runtime.getRuntime().totalMemory());
+
+	timer.reset().start();
+
+	//approxFinder.close();
+	DoubleArrayList approxQuantiles = approxFinder.quantileElements(new DoubleArrayList(phis)); 
+
+	timer.stop().display();
+	
+	System.out.println("Phis="+new DoubleArrayList(phis));
+	System.out.println("ApproxQuantiles="+approxQuantiles);
+
+	//System.out.println("MaxLevel of full buffers="+maxLevelOfFullBuffers(approxFinder.bufferSet));
+
+	//System.out.println("total buffers filled="+ approxFinder.totalBuffersFilled);
+	//System.out.println("free="+Runtime.getRuntime().freeMemory());
+	//System.out.println("total="+Runtime.getRuntime().totalMemory());
+
+
+	if (computeExactQuantilesAlso) {
+		System.out.println("Comparing with exact quantile computation...");
+
+		timer.reset().start();
+
+		//exactFinder.close();
+		DoubleArrayList exactQuantiles = exactFinder.quantileElements(new DoubleArrayList(phis));
+		timer.stop().display();
+
+		System.out.println("ExactQuantiles="+exactQuantiles);
+
+
+		//double[] errors1 = errors1(exactQuantiles.elements(), approxQuantiles.elements());
+		//System.out.println("Error1="+new DoubleArrayList(errors1));
+
+		/*
+		final DoubleArrayList buffer = new DoubleArrayList((int)exactFinder.size());
+		exactFinder.forEach(
+			new org.apache.mahout.colt.function.DoubleFunction() {
+				public void apply(double element) {
+					buffer.add(element);
+				}
+			}
+		);
+		*/
+				
+		
+		DoubleArrayList observedEpsilons = observedEpsilonsAtPhis(new DoubleArrayList(phis), (ExactDoubleQuantileFinder) exactFinder, approxFinder, epsilon);
+		System.out.println("observedEpsilons="+observedEpsilons);
+
+		double element = 1000.0f;
+		
+
+		System.out.println("exact phi("+element+")="+exactFinder.phi(element));
+		System.out.println("apprx phi("+element+")="+approxFinder.phi(element));
+
+		System.out.println("exact elem(phi("+element+"))="+exactFinder.quantileElements(new DoubleArrayList(new double[] {exactFinder.phi(element)})));
+		System.out.println("apprx elem(phi("+element+"))="+approxFinder.quantileElements(new DoubleArrayList(new double[] {approxFinder.phi(element)})));		
+	}
+}
+/**
+ * Not yet commented.
+ */
+public static void testRank() {
+	DoubleArrayList list = new DoubleArrayList(new double[] {1.0f, 5.0f, 5.0f, 5.0f, 7.0f, 10.f});
+	//System.out.println(rankOfWithin(5.0f, list));
+}
+}

Propchange: lucene/mahout/trunk/matrix/src/main/java/org/apache/mahout/jet/stat/quantile/QuantileFinderTest.java
------------------------------------------------------------------------------
    svn:eol-style = native

Added: lucene/mahout/trunk/matrix/src/main/java/org/apache/mahout/jet/stat/quantile/UnknownDoubleQuantileEstimator.java
URL: http://svn.apache.org/viewvc/lucene/mahout/trunk/matrix/src/main/java/org/apache/mahout/jet/stat/quantile/UnknownDoubleQuantileEstimator.java?rev=883365&view=auto
==============================================================================
--- lucene/mahout/trunk/matrix/src/main/java/org/apache/mahout/jet/stat/quantile/UnknownDoubleQuantileEstimator.java (added)
+++ lucene/mahout/trunk/matrix/src/main/java/org/apache/mahout/jet/stat/quantile/UnknownDoubleQuantileEstimator.java Mon Nov 23 15:14:26 2009
@@ -0,0 +1,174 @@
+/*
+Copyright ï¿½ 1999 CERN - European Organization for Nuclear Research.
+Permission to use, copy, modify, distribute and sell this software and its documentation for any purpose 
+is hereby granted without fee, provided that the above copyright notice appear in all copies and 
+that both that copyright notice and this permission notice appear in supporting documentation. 
+CERN makes no representations about the suitability of this software for any purpose. 
+It is provided "as is" without expressed or implied warranty.
+*/
+package org.apache.mahout.jet.stat.quantile;
+
+import org.apache.mahout.colt.list.DoubleArrayList;
+import org.apache.mahout.colt.list.ObjectArrayList;
+import org.apache.mahout.jet.random.engine.RandomEngine;
+import org.apache.mahout.jet.random.sampling.WeightedRandomSampler;
+/**
+ * Approximate quantile finding algorithm for unknown <tt>N</tt> requiring only one pass and little main memory; computes quantiles over a sequence of <tt>double</tt> elements.
+ * This algorithm requires at most two times the memory of a corresponding approx. quantile finder knowing <tt>N</tt>.
+ *
+ * <p>Needs as input the following parameters:<p>
+ * <dt>1. <tt>quantiles</tt> - the number of quantiles to be computed.
+ * <dt>2. <tt>epsilon</tt> - the allowed approximation error on quantiles. The approximation guarantee of this algorithm is explicit.
+ *
+ * <p>It is also possible to couple the approximation algorithm with random sampling to further reduce memory requirements. 
+ * With sampling, the approximation guarantees are explicit but probabilistic, i.e. they apply with respect to a (user controlled) confidence parameter "delta".
+ *
+ * <dt>3. <tt>delta</tt> - the probability allowed that the approximation error fails to be smaller than epsilon. Set <tt>delta</tt> to zero for explicit non probabilistic guarantees.
+ *
+ * You usually don't instantiate quantile finders by using the constructor. Instead use the factory <tt>QuantileFinderFactor</tt> to do so. It will set up the right parametrization for you.
+ * 
+ * <p>After Gurmeet Singh Manku, Sridhar Rajagopalan and Bruce G. Lindsay,
+ * Random Sampling Techniques for Space Efficient Online Computation of Order Statistics of Large Datasets.
+ * Accepted for Proc. of the 1999 ACM SIGMOD Int. Conf. on Management of Data,
+ * Paper (soon) available <A HREF="http://www-cad.eecs.berkeley.edu/~manku"> here</A>.
+ *
+ * @author wolfgang.hoschek@cern.ch
+ * @version 1.0, 09/24/99
+ * @see QuantileFinderFactory
+ * @see KnownApproximateDoubleQuantileFinder
+ */
+class UnknownDoubleQuantileEstimator extends DoubleQuantileEstimator {
+	protected int currentTreeHeight;
+	final protected int treeHeightStartingSampling;
+	protected WeightedRandomSampler sampler;
+	protected double precomputeEpsilon;
+/**
+ * Constructs an approximate quantile finder with b buffers, each having k elements.
+ * @param b the number of buffers
+ * @param k the number of elements per buffer
+ * @param h the tree height at which sampling shall start.
+ * @param precomputeEpsilon the epsilon for which quantiles shall be precomputed; set this value <=0.0 if nothing shall be precomputed.
+ * @param generator a uniform random number generator.
+ */
+public UnknownDoubleQuantileEstimator(int b, int k, int h, double precomputeEpsilon, RandomEngine generator) {
+	this.sampler = new WeightedRandomSampler(1,generator);
+	setUp(b,k);
+	this.treeHeightStartingSampling = h;
+	this.precomputeEpsilon = precomputeEpsilon;
+	this.clear();
+}
+/**
+ * Not yet commented.
+ */
+protected DoubleBuffer[] buffersToCollapse() {
+	DoubleBuffer[] fullBuffers = bufferSet._getFullOrPartialBuffers();
+
+	sortAscendingByLevel(fullBuffers);
+		
+	// if there is only one buffer at the lowest level, then increase its level so that there are at least two at the lowest level.
+	int minLevel = fullBuffers[1].level();
+	if (fullBuffers[0].level() < minLevel) {
+		fullBuffers[0].level(minLevel);
+	}
+
+	return bufferSet._getFullOrPartialBuffersWithLevel(minLevel);
+}
+/**
+ * Removes all elements from the receiver.  The receiver will
+ * be empty after this call returns, and its memory requirements will be close to zero.
+ */
+public synchronized void clear() {
+	super.clear();
+	this.currentTreeHeight = 1;
+	this.sampler.setWeight(1);
+}
+/**
+ * Returns a deep copy of the receiver.
+ *
+ * @return a deep copy of the receiver.
+ */
+public Object clone() {
+	UnknownDoubleQuantileEstimator copy = (UnknownDoubleQuantileEstimator) super.clone();
+	if (this.sampler != null) copy.sampler = (WeightedRandomSampler) copy.sampler.clone();
+	return copy;
+}
+/**
+ * Not yet commented.
+ */
+protected void newBuffer() {
+	currentBufferToFill = bufferSet._getFirstEmptyBuffer();
+	if (currentBufferToFill==null) throw new RuntimeException("Oops, no empty buffer.");
+
+	currentBufferToFill.level(currentTreeHeight - 1);
+	currentBufferToFill.weight(sampler.getWeight());
+}
+/**
+ * Not yet commented.
+ */
+protected void postCollapse(DoubleBuffer[] toCollapse) {
+	if (toCollapse.length == bufferSet.b()) { //delta for unknown finder
+		currentTreeHeight++;
+		if (currentTreeHeight>=treeHeightStartingSampling) {
+			sampler.setWeight(sampler.getWeight() * 2);
+		}
+	}
+}
+/**
+ * Computes the specified quantile elements over the values previously added.
+ * @param phis the quantiles for which elements are to be computed. Each phi must be in the interval (0.0,1.0]. <tt>phis</tt> must be sorted ascending.
+ * @return the approximate quantile elements.
+ */
+public DoubleArrayList quantileElements(DoubleArrayList phis) {
+	if (precomputeEpsilon<=0.0) return super.quantileElements(phis);
+	
+	int quantilesToPrecompute = (int) Utils.epsilonCeiling(1.0 / precomputeEpsilon);
+	/*
+	if (phis.size() > quantilesToPrecompute) {
+		// illegal use case!
+		// we compute results, but loose explicit approximation guarantees.
+		return super.quantileElements(phis);
+	}
+	*/
+ 
+	//select that quantile from the precomputed set that corresponds to a position closest to phi.
+	phis = phis.copy();
+	double e = precomputeEpsilon;
+	for (int index=phis.size(); --index >= 0;) {
+		double phi = phis.get(index);
+		int i = (int) Math.round( ((2.0*phi/e) - 1.0 ) / 2.0); // finds closest
+		i = Math.min(quantilesToPrecompute-1, Math.max(0,i));
+		double augmentedPhi = (e/2.0)*(1+2*i);
+		phis.set(index,augmentedPhi);				
+	}
+	
+	return super.quantileElements(phis);
+}
+/**
+ * Not yet commented.
+ */
+protected boolean sampleNextElement() {
+	return sampler.sampleNextElement();
+}
+/**
+ * To do. This could faster be done without sorting (min and second min).
+ */
+protected static void sortAscendingByLevel(DoubleBuffer[] fullBuffers) {
+	new ObjectArrayList(fullBuffers).quickSortFromTo(0,fullBuffers.length-1,
+		new java.util.Comparator() {
+			public int compare(Object o1, Object o2) {
+				int l1 = ((DoubleBuffer) o1).level();
+				int l2 = ((DoubleBuffer) o2).level();
+				return l1<l2? -1: l1==l2? 0: +1;
+			}
+		}
+	);
+}
+/**
+ * Returns a String representation of the receiver.
+ */
+public String toString() {
+	StringBuffer buf = new StringBuffer(super.toString());
+	buf.setLength(buf.length()-1);
+	return buf + ", h="+currentTreeHeight+", hStartSampling="+treeHeightStartingSampling+", precomputeEpsilon="+precomputeEpsilon+")";
+}
+}

Propchange: lucene/mahout/trunk/matrix/src/main/java/org/apache/mahout/jet/stat/quantile/UnknownDoubleQuantileEstimator.java
------------------------------------------------------------------------------
    svn:eol-style = native

Added: lucene/mahout/trunk/matrix/src/main/java/org/apache/mahout/jet/stat/quantile/Utils.java
URL: http://svn.apache.org/viewvc/lucene/mahout/trunk/matrix/src/main/java/org/apache/mahout/jet/stat/quantile/Utils.java?rev=883365&view=auto
==============================================================================
--- lucene/mahout/trunk/matrix/src/main/java/org/apache/mahout/jet/stat/quantile/Utils.java (added)
+++ lucene/mahout/trunk/matrix/src/main/java/org/apache/mahout/jet/stat/quantile/Utils.java Mon Nov 23 15:14:26 2009
@@ -0,0 +1,28 @@
+/*
+Copyright ï¿½ 1999 CERN - European Organization for Nuclear Research.
+Permission to use, copy, modify, distribute and sell this software and its documentation for any purpose 
+is hereby granted without fee, provided that the above copyright notice appear in all copies and 
+that both that copyright notice and this permission notice appear in supporting documentation. 
+CERN makes no representations about the suitability of this software for any purpose. 
+It is provided "as is" without expressed or implied warranty.
+*/
+package org.apache.mahout.jet.stat.quantile;
+
+/**
+ * Holds some utility methods shared by different quantile finding implementations.
+ */
+class Utils {
+/**
+ * Makes this class non instantiable, but still let's others inherit from it.
+ */
+protected Utils() {
+	throw new RuntimeException("Non instantiable");
+}
+/**
+ * Similar to Math.ceil(value), but adjusts small numerical rounding errors +- epsilon.
+ */
+public static long epsilonCeiling(double value) {
+	double epsilon = 0.0000001;
+	return (long) Math.ceil(value - epsilon);
+}
+}

Propchange: lucene/mahout/trunk/matrix/src/main/java/org/apache/mahout/jet/stat/quantile/Utils.java
------------------------------------------------------------------------------
    svn:eol-style = native

Added: lucene/mahout/trunk/matrix/src/main/java/org/apache/mahout/jet/stat/quantile/package.html
URL: http://svn.apache.org/viewvc/lucene/mahout/trunk/matrix/src/main/java/org/apache/mahout/jet/stat/quantile/package.html?rev=883365&view=auto
==============================================================================
--- lucene/mahout/trunk/matrix/src/main/java/org/apache/mahout/jet/stat/quantile/package.html (added)
+++ lucene/mahout/trunk/matrix/src/main/java/org/apache/mahout/jet/stat/quantile/package.html Mon Nov 23 15:14:26 2009
@@ -0,0 +1,29 @@
+<HTML>
+<BODY>
+Scalable algorithms and data structures to compute approximate quantiles over very large data sequences.
+The approximation guarantees are explicit, and apply for arbitrary value distributions and arrival distributions of the
+dataset.
+The main memory requirements are smaller than for any other known technique by an order of magnitude.
+
+<p>The approx. algorithms are primarily intended to help applications scale.
+  When faced with a large data sequence, traditional methods either need very large memories or time consuming disk
+  based sorting.
+  In constrast, the approx. algorithms can deal with > 10^10 values without disk based sorting.
+
+<p>All classes can be seen from various angles, for example as
+  <dt>1. Algorithm to compute quantiles.
+  <dt>2. 1-dim-equi-depth histogram.
+  <dt>3. 1-dim-histogram arbitrarily rebinnable in real-time.
+  <dt>4. A space efficient MultiSet data structure using lossy compression.
+  <dt>5. A space efficient value preserving bin of a 2-dim or d-dim histogram.
+  <dt>(All subject to an accuracy specified by the user.)
+
+    Have a look at the documentation of class {@link cern.jet.stat.quantile.QuantileFinderFactory} and the interface
+    {@link cern.jet.stat.quantile.DoubleQuantileFinder} to learn more.
+    Most users will never need to know more than how to use these.
+    Actual implementations of the <tt>QuantileFinder</tt> interface are hidden.
+    They are indirectly constructed via the the factory.
+    <br>
+    Also see {@link hep.aida.bin.QuantileBin1D}, demonstrating how this package can be used.
+</BODY>
+</HTML>

Propchange: lucene/mahout/trunk/matrix/src/main/java/org/apache/mahout/jet/stat/quantile/package.html
------------------------------------------------------------------------------
    svn:eol-style = native

Added: lucene/mahout/trunk/matrix/src/main/java/org/apache/mahout/matrix/Arrays.java
URL: http://svn.apache.org/viewvc/lucene/mahout/trunk/matrix/src/main/java/org/apache/mahout/matrix/Arrays.java?rev=883365&view=auto
==============================================================================
--- lucene/mahout/trunk/matrix/src/main/java/org/apache/mahout/matrix/Arrays.java (added)
+++ lucene/mahout/trunk/matrix/src/main/java/org/apache/mahout/matrix/Arrays.java Mon Nov 23 15:14:26 2009
@@ -0,0 +1,606 @@
+/*
+Copyright 1999 CERN - European Organization for Nuclear Research.
+Permission to use, copy, modify, distribute and sell this software and its documentation for any purpose 
+is hereby granted without fee, provided that the above copyright notice appear in all copies and 
+that both that copyright notice and this permission notice appear in supporting documentation. 
+CERN makes no representations about the suitability of this software for any purpose. 
+It is provided "as is" without expressed or implied warranty.
+*/
+package org.apache.mahout.colt;
+
+/**
+ * Array manipulations; complements <tt>java.util.Arrays</tt>.
+ *
+ * @see java.util.Arrays
+ * @see org.apache.mahout.colt.Sorting
+ *
+ * @author wolfgang.hoschek@cern.ch
+ * @version 1.0, 03-Jul-99
+ */
+/** 
+ * @deprecated until unit tests are in place.  Until this time, this class/interface is unsupported.
+ */
+@Deprecated
+public class Arrays extends Object {
+/**
+ * Makes this class non instantiable, but still let's others inherit from it.
+ */
+protected Arrays() {}
+/**
+ * Ensures that a given array can hold up to <tt>minCapacity</tt> elements.
+ *
+ * Returns the identical array if it can hold at least the number of elements specified.
+ * Otherwise, returns a new array with increased capacity containing the same elements, ensuring  
+ * that it can hold at least the number of elements specified by 
+ * the minimum capacity argument.
+ *
+ * @param   minCapacity   the desired minimum capacity.
+ */
+public static byte[] ensureCapacity(byte[] array, int minCapacity) {
+	int oldCapacity = array.length;
+	byte[] newArray;
+	if (minCapacity > oldCapacity) {
+	    int newCapacity = (oldCapacity * 3)/2 + 1;
+		if (newCapacity < minCapacity) {
+			newCapacity = minCapacity;
+		}
+		
+	    newArray = new byte[newCapacity];
+	    System.arraycopy(array, 0, newArray, 0, oldCapacity);
+	}
+	else {
+		newArray=array;
+	}
+	return newArray;
+}
+/**
+ * Ensures that a given array can hold up to <tt>minCapacity</tt> elements.
+ *
+ * Returns the identical array if it can hold at least the number of elements specified.
+ * Otherwise, returns a new array with increased capacity containing the same elements, ensuring  
+ * that it can hold at least the number of elements specified by 
+ * the minimum capacity argument.
+ *
+ * @param   minCapacity   the desired minimum capacity.
+ */
+public static char[] ensureCapacity(char[] array, int minCapacity) {
+	int oldCapacity = array.length;
+	char[] newArray;
+	if (minCapacity > oldCapacity) {
+	    int newCapacity = (oldCapacity * 3)/2 + 1;
+		if (newCapacity < minCapacity) {
+			newCapacity = minCapacity;
+		}
+		
+	    newArray = new char[newCapacity];
+	    System.arraycopy(array, 0, newArray, 0, oldCapacity);
+	}
+	else {
+		newArray=array;
+	}
+	return newArray;
+}
+/**
+ * Ensures that a given array can hold up to <tt>minCapacity</tt> elements.
+ *
+ * Returns the identical array if it can hold at least the number of elements specified.
+ * Otherwise, returns a new array with increased capacity containing the same elements, ensuring  
+ * that it can hold at least the number of elements specified by 
+ * the minimum capacity argument.
+ *
+ * @param   minCapacity   the desired minimum capacity.
+ */
+public static double[] ensureCapacity(double[] array, int minCapacity) {
+	int oldCapacity = array.length;
+	double[] newArray;
+	if (minCapacity > oldCapacity) {
+	    int newCapacity = (oldCapacity * 3)/2 + 1;
+		if (newCapacity < minCapacity) {
+			newCapacity = minCapacity;
+		}
+		
+	    newArray = new double[newCapacity];
+	    //for (int i = oldCapacity; --i >= 0; ) newArray[i] = array[i];
+	    System.arraycopy(array, 0, newArray, 0, oldCapacity);
+	}
+	else {
+		newArray=array;
+	}
+	return newArray;
+}
+/**
+ * Ensures that a given array can hold up to <tt>minCapacity</tt> elements.
+ *
+ * Returns the identical array if it can hold at least the number of elements specified.
+ * Otherwise, returns a new array with increased capacity containing the same elements, ensuring  
+ * that it can hold at least the number of elements specified by 
+ * the minimum capacity argument.
+ *
+ * @param   minCapacity   the desired minimum capacity.
+ */
+public static float[] ensureCapacity(float[] array, int minCapacity) {
+	int oldCapacity = array.length;
+	float[] newArray;
+	if (minCapacity > oldCapacity) {
+	    int newCapacity = (oldCapacity * 3)/2 + 1;
+		if (newCapacity < minCapacity) {
+			newCapacity = minCapacity;
+		}
+		
+	    newArray = new float[newCapacity];
+	    System.arraycopy(array, 0, newArray, 0, oldCapacity);
+	}
+	else {
+		newArray=array;
+	}
+	return newArray;
+}
+/**
+ * Ensures that a given array can hold up to <tt>minCapacity</tt> elements.
+ *
+ * Returns the identical array if it can hold at least the number of elements specified.
+ * Otherwise, returns a new array with increased capacity containing the same elements, ensuring  
+ * that it can hold at least the number of elements specified by 
+ * the minimum capacity argument.
+ *
+ * @param   minCapacity   the desired minimum capacity.
+ */
+public static int[] ensureCapacity(int[] array, int minCapacity) {
+	int oldCapacity = array.length;
+	int[] newArray;
+	if (minCapacity > oldCapacity) {
+	    int newCapacity = (oldCapacity * 3)/2 + 1;
+		if (newCapacity < minCapacity) {
+			newCapacity = minCapacity;
+		}
+		
+	    newArray = new int[newCapacity];
+	    System.arraycopy(array, 0, newArray, 0, oldCapacity);
+	}
+	else {
+		newArray=array;
+	}
+	return newArray;
+}
+/**
+ * Ensures that a given array can hold up to <tt>minCapacity</tt> elements.
+ *
+ * Returns the identical array if it can hold at least the number of elements specified.
+ * Otherwise, returns a new array with increased capacity containing the same elements, ensuring  
+ * that it can hold at least the number of elements specified by 
+ * the minimum capacity argument.
+ *
+ * @param   minCapacity   the desired minimum capacity.
+ */
+public static long[] ensureCapacity(long[] array, int minCapacity) {
+	int oldCapacity = array.length;
+	long[] newArray;
+	if (minCapacity > oldCapacity) {
+	    int newCapacity = (oldCapacity * 3)/2 + 1;
+		if (newCapacity < minCapacity) {
+			newCapacity = minCapacity;
+		}
+		
+	    newArray = new long[newCapacity];
+	    System.arraycopy(array, 0, newArray, 0, oldCapacity);
+	}
+	else {
+		newArray=array;
+	}
+	return newArray;
+}
+/**
+ * Ensures that a given array can hold up to <tt>minCapacity</tt> elements.
+ *
+ * Returns the identical array if it can hold at least the number of elements specified.
+ * Otherwise, returns a new array with increased capacity containing the same elements, ensuring  
+ * that it can hold at least the number of elements specified by 
+ * the minimum capacity argument.
+ *
+ * @param   minCapacity   the desired minimum capacity.
+ */
+public static Object[] ensureCapacity(Object[] array, int minCapacity) {
+	int oldCapacity = array.length;
+	Object[] newArray;
+	if (minCapacity > oldCapacity) {
+	    int newCapacity = (oldCapacity * 3)/2 + 1;
+		if (newCapacity < minCapacity) {
+			newCapacity = minCapacity;
+		}
+		
+	    newArray = new Object[newCapacity];
+	    System.arraycopy(array, 0, newArray, 0, oldCapacity);
+	}
+	else {
+		newArray=array;
+	}
+	return newArray;
+}
+/**
+ * Ensures that a given array can hold up to <tt>minCapacity</tt> elements.
+ *
+ * Returns the identical array if it can hold at least the number of elements specified.
+ * Otherwise, returns a new array with increased capacity containing the same elements, ensuring  
+ * that it can hold at least the number of elements specified by 
+ * the minimum capacity argument.
+ *
+ * @param   minCapacity   the desired minimum capacity.
+ */
+public static short[] ensureCapacity(short[] array, int minCapacity) {
+	int oldCapacity = array.length;
+	short[] newArray;
+	if (minCapacity > oldCapacity) {
+	    int newCapacity = (oldCapacity * 3)/2 + 1;
+		if (newCapacity < minCapacity) {
+			newCapacity = minCapacity;
+		}
+		
+	    newArray = new short[newCapacity];
+	    System.arraycopy(array, 0, newArray, 0, oldCapacity);
+	}
+	else {
+		newArray=array;
+	}
+	return newArray;
+}
+/**
+ * Ensures that a given array can hold up to <tt>minCapacity</tt> elements.
+ *
+ * Returns the identical array if it can hold at least the number of elements specified.
+ * Otherwise, returns a new array with increased capacity containing the same elements, ensuring  
+ * that it can hold at least the number of elements specified by 
+ * the minimum capacity argument.
+ *
+ * @param   minCapacity   the desired minimum capacity.
+ */
+public static boolean[] ensureCapacity(boolean[] array, int minCapacity) {
+	int oldCapacity = array.length;
+	boolean[] newArray;
+	if (minCapacity > oldCapacity) {
+	    int newCapacity = (oldCapacity * 3)/2 + 1;
+		if (newCapacity < minCapacity) {
+			newCapacity = minCapacity;
+		}
+		
+	    newArray = new boolean[newCapacity];
+	    System.arraycopy(array, 0, newArray, 0, oldCapacity);
+	}
+	else {
+		newArray=array;
+	}
+	return newArray;
+}
+/**
+ * Returns a string representation of the specified array.  The string
+ * representation consists of a list of the arrays's elements, enclosed in square brackets
+ * (<tt>"[]"</tt>).  Adjacent elements are separated by the characters
+ * <tt>", "</tt> (comma and space).
+ * @return a string representation of the specified array.
+ */
+public static String toString(byte[] array) {
+	StringBuffer buf = new StringBuffer();
+	buf.append("[");
+	int maxIndex = array.length - 1;
+	for (int i = 0; i <= maxIndex; i++) {
+	    buf.append(array[i]);
+	    if (i < maxIndex)
+		buf.append(", ");
+	}
+	buf.append("]");
+	return buf.toString();
+}
+/**
+ * Returns a string representation of the specified array.  The string
+ * representation consists of a list of the arrays's elements, enclosed in square brackets
+ * (<tt>"[]"</tt>).  Adjacent elements are separated by the characters
+ * <tt>", "</tt> (comma and space).
+ * @return a string representation of the specified array.
+ */
+public static String toString(char[] array) {
+	StringBuffer buf = new StringBuffer();
+	buf.append("[");
+	int maxIndex = array.length - 1;
+	for (int i = 0; i <= maxIndex; i++) {
+	    buf.append(array[i]);
+	    if (i < maxIndex)
+		buf.append(", ");
+	}
+	buf.append("]");
+	return buf.toString();
+}
+/**
+ * Returns a string representation of the specified array.  The string
+ * representation consists of a list of the arrays's elements, enclosed in square brackets
+ * (<tt>"[]"</tt>).  Adjacent elements are separated by the characters
+ * <tt>", "</tt> (comma and space).
+ * @return a string representation of the specified array.
+ */
+public static String toString(double[] array) {
+	StringBuffer buf = new StringBuffer();
+	buf.append("[");
+	int maxIndex = array.length - 1;
+	for (int i = 0; i <= maxIndex; i++) {
+	    buf.append(array[i]);
+	    if (i < maxIndex)
+		buf.append(", ");
+	}
+	buf.append("]");
+	return buf.toString();
+}
+/**
+ * Returns a string representation of the specified array.  The string
+ * representation consists of a list of the arrays's elements, enclosed in square brackets
+ * (<tt>"[]"</tt>).  Adjacent elements are separated by the characters
+ * <tt>", "</tt> (comma and space).
+ * @return a string representation of the specified array.
+ */
+public static String toString(float[] array) {
+	StringBuffer buf = new StringBuffer();
+	buf.append("[");
+	int maxIndex = array.length - 1;
+	for (int i = 0; i <= maxIndex; i++) {
+	    buf.append(array[i]);
+	    if (i < maxIndex)
+		buf.append(", ");
+	}
+	buf.append("]");
+	return buf.toString();
+}
+/**
+ * Returns a string representation of the specified array.  The string
+ * representation consists of a list of the arrays's elements, enclosed in square brackets
+ * (<tt>"[]"</tt>).  Adjacent elements are separated by the characters
+ * <tt>", "</tt> (comma and space).
+ * @return a string representation of the specified array.
+ */
+public static String toString(int[] array) {
+	StringBuffer buf = new StringBuffer();
+	buf.append("[");
+	int maxIndex = array.length - 1;
+	for (int i = 0; i <= maxIndex; i++) {
+	    buf.append(array[i]);
+	    if (i < maxIndex)
+		buf.append(", ");
+	}
+	buf.append("]");
+	return buf.toString();
+}
+/**
+ * Returns a string representation of the specified array.  The string
+ * representation consists of a list of the arrays's elements, enclosed in square brackets
+ * (<tt>"[]"</tt>).  Adjacent elements are separated by the characters
+ * <tt>", "</tt> (comma and space).
+ * @return a string representation of the specified array.
+ */
+public static String toString(long[] array) {
+	StringBuffer buf = new StringBuffer();
+	buf.append("[");
+	int maxIndex = array.length - 1;
+	for (int i = 0; i <= maxIndex; i++) {
+	    buf.append(array[i]);
+	    if (i < maxIndex)
+		buf.append(", ");
+	}
+	buf.append("]");
+	return buf.toString();
+}
+/**
+ * Returns a string representation of the specified array.  The string
+ * representation consists of a list of the arrays's elements, enclosed in square brackets
+ * (<tt>"[]"</tt>).  Adjacent elements are separated by the characters
+ * <tt>", "</tt> (comma and space).
+ * @return a string representation of the specified array.
+ */
+public static String toString(Object[] array) {
+	StringBuffer buf = new StringBuffer();
+	buf.append("[");
+	int maxIndex = array.length - 1;
+	for (int i = 0; i <= maxIndex; i++) {
+	    buf.append(array[i]);
+	    if (i < maxIndex)
+		buf.append(", ");
+	}
+	buf.append("]");
+	return buf.toString();
+}
+/**
+ * Returns a string representation of the specified array.  The string
+ * representation consists of a list of the arrays's elements, enclosed in square brackets
+ * (<tt>"[]"</tt>).  Adjacent elements are separated by the characters
+ * <tt>", "</tt> (comma and space).
+ * @return a string representation of the specified array.
+ */
+public static String toString(short[] array) {
+	StringBuffer buf = new StringBuffer();
+	buf.append("[");
+	int maxIndex = array.length - 1;
+	for (int i = 0; i <= maxIndex; i++) {
+	    buf.append(array[i]);
+	    if (i < maxIndex)
+		buf.append(", ");
+	}
+	buf.append("]");
+	return buf.toString();
+}
+/**
+ * Returns a string representation of the specified array.  The string
+ * representation consists of a list of the arrays's elements, enclosed in square brackets
+ * (<tt>"[]"</tt>).  Adjacent elements are separated by the characters
+ * <tt>", "</tt> (comma and space).
+ * @return a string representation of the specified array.
+ */
+public static String toString(boolean[] array) {
+	StringBuffer buf = new StringBuffer();
+	buf.append("[");
+	int maxIndex = array.length - 1;
+	for (int i = 0; i <= maxIndex; i++) {
+	    buf.append(array[i]);
+	    if (i < maxIndex)
+		buf.append(", ");
+	}
+	buf.append("]");
+	return buf.toString();
+}
+/**
+ * Ensures that the specified array cannot hold more than <tt>maxCapacity</tt> elements.
+ * An application can use this operation to minimize array storage.
+ * <p>
+ * Returns the identical array if <tt>array.length &lt;= maxCapacity</tt>.
+ * Otherwise, returns a new array with a length of <tt>maxCapacity</tt>
+ * containing the first <tt>maxCapacity</tt> elements of <tt>array</tt>.
+ *
+ * @param   maxCapacity   the desired maximum capacity.
+ */
+public static byte[] trimToCapacity(byte[] array, int maxCapacity) {
+	if (array.length > maxCapacity) {
+	    byte oldArray[] = array;
+	    array = new byte[maxCapacity];
+	    System.arraycopy(oldArray, 0, array, 0, maxCapacity);
+	}
+	return array;
+}
+/**
+ * Ensures that the specified array cannot hold more than <tt>maxCapacity</tt> elements.
+ * An application can use this operation to minimize array storage.
+ * <p>
+ * Returns the identical array if <tt>array.length &lt;= maxCapacity</tt>.
+ * Otherwise, returns a new array with a length of <tt>maxCapacity</tt>
+ * containing the first <tt>maxCapacity</tt> elements of <tt>array</tt>.
+ *
+ * @param   maxCapacity   the desired maximum capacity.
+ */
+public static char[] trimToCapacity(char[] array, int maxCapacity) {
+	if (array.length > maxCapacity) {
+	    char oldArray[] = array;
+	    array = new char[maxCapacity];
+	    System.arraycopy(oldArray, 0, array, 0, maxCapacity);
+	}
+	return array;
+}
+/**
+ * Ensures that the specified array cannot hold more than <tt>maxCapacity</tt> elements.
+ * An application can use this operation to minimize array storage.
+ * <p>
+ * Returns the identical array if <tt>array.length &lt;= maxCapacity</tt>.
+ * Otherwise, returns a new array with a length of <tt>maxCapacity</tt>
+ * containing the first <tt>maxCapacity</tt> elements of <tt>array</tt>.
+ *
+ * @param   maxCapacity   the desired maximum capacity.
+ */
+public static double[] trimToCapacity(double[] array, int maxCapacity) {
+	if (array.length > maxCapacity) {
+	    double oldArray[] = array;
+	    array = new double[maxCapacity];
+	    System.arraycopy(oldArray, 0, array, 0, maxCapacity);
+	}
+	return array;
+}
+/**
+ * Ensures that the specified array cannot hold more than <tt>maxCapacity</tt> elements.
+ * An application can use this operation to minimize array storage.
+ * <p>
+ * Returns the identical array if <tt>array.length &lt;= maxCapacity</tt>.
+ * Otherwise, returns a new array with a length of <tt>maxCapacity</tt>
+ * containing the first <tt>maxCapacity</tt> elements of <tt>array</tt>.
+ *
+ * @param   maxCapacity   the desired maximum capacity.
+ */
+public static float[] trimToCapacity(float[] array, int maxCapacity) {
+	if (array.length > maxCapacity) {
+	    float oldArray[] = array;
+	    array = new float[maxCapacity];
+	    System.arraycopy(oldArray, 0, array, 0, maxCapacity);
+	}
+	return array;
+}
+/**
+ * Ensures that the specified array cannot hold more than <tt>maxCapacity</tt> elements.
+ * An application can use this operation to minimize array storage.
+ * <p>
+ * Returns the identical array if <tt>array.length &lt;= maxCapacity</tt>.
+ * Otherwise, returns a new array with a length of <tt>maxCapacity</tt>
+ * containing the first <tt>maxCapacity</tt> elements of <tt>array</tt>.
+ *
+ * @param   maxCapacity   the desired maximum capacity.
+ */
+public static int[] trimToCapacity(int[] array, int maxCapacity) {
+	if (array.length > maxCapacity) {
+	    int oldArray[] = array;
+	    array = new int[maxCapacity];
+	    System.arraycopy(oldArray, 0, array, 0, maxCapacity);
+	}
+	return array;
+}
+/**
+ * Ensures that the specified array cannot hold more than <tt>maxCapacity</tt> elements.
+ * An application can use this operation to minimize array storage.
+ * <p>
+ * Returns the identical array if <tt>array.length &lt;= maxCapacity</tt>.
+ * Otherwise, returns a new array with a length of <tt>maxCapacity</tt>
+ * containing the first <tt>maxCapacity</tt> elements of <tt>array</tt>.
+ *
+ * @param   maxCapacity   the desired maximum capacity.
+ */
+public static long[] trimToCapacity(long[] array, int maxCapacity) {
+	if (array.length > maxCapacity) {
+	    long oldArray[] = array;
+	    array = new long[maxCapacity];
+	    System.arraycopy(oldArray, 0, array, 0, maxCapacity);
+	}
+	return array;
+}
+/**
+ * Ensures that the specified array cannot hold more than <tt>maxCapacity</tt> elements.
+ * An application can use this operation to minimize array storage.
+ * <p>
+ * Returns the identical array if <tt>array.length &lt;= maxCapacity</tt>.
+ * Otherwise, returns a new array with a length of <tt>maxCapacity</tt>
+ * containing the first <tt>maxCapacity</tt> elements of <tt>array</tt>.
+ *
+ * @param   maxCapacity   the desired maximum capacity.
+ */
+public static Object[] trimToCapacity(Object[] array, int maxCapacity) {
+	if (array.length > maxCapacity) {
+	    Object oldArray[] = array;
+	    array = new Object[maxCapacity];
+	    System.arraycopy(oldArray, 0, array, 0, maxCapacity);
+	}
+	return array;
+}
+/**
+ * Ensures that the specified array cannot hold more than <tt>maxCapacity</tt> elements.
+ * An application can use this operation to minimize array storage.
+ * <p>
+ * Returns the identical array if <tt>array.length &lt;= maxCapacity</tt>.
+ * Otherwise, returns a new array with a length of <tt>maxCapacity</tt>
+ * containing the first <tt>maxCapacity</tt> elements of <tt>array</tt>.
+ *
+ * @param   maxCapacity   the desired maximum capacity.
+ */
+public static short[] trimToCapacity(short[] array, int maxCapacity) {
+	if (array.length > maxCapacity) {
+	    short oldArray[] = array;
+	    array = new short[maxCapacity];
+	    System.arraycopy(oldArray, 0, array, 0, maxCapacity);
+	}
+	return array;
+}
+/**
+ * Ensures that the specified array cannot hold more than <tt>maxCapacity</tt> elements.
+ * An application can use this operation to minimize array storage.
+ * <p>
+ * Returns the identical array if <tt>array.length &lt;= maxCapacity</tt>.
+ * Otherwise, returns a new array with a length of <tt>maxCapacity</tt>
+ * containing the first <tt>maxCapacity</tt> elements of <tt>array</tt>.
+ *
+ * @param   maxCapacity   the desired maximum capacity.
+ */
+public static boolean[] trimToCapacity(boolean[] array, int maxCapacity) {
+	if (array.length > maxCapacity) {
+	    boolean oldArray[] = array;
+	    array = new boolean[maxCapacity];
+	    System.arraycopy(oldArray, 0, array, 0, maxCapacity);
+	}
+	return array;
+}
+}