You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@commons.apache.org by er...@apache.org on 2021/06/10 02:12:15 UTC
[commons-math] 02/02: MATH-1603: Userguide update.
This is an automated email from the ASF dual-hosted git repository.
erans pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/commons-math.git
commit 64474ed9633c4a2856f6bc4becdd22869046a716
Author: Gilles Sadowski <gi...@gmail.com>
AuthorDate: Thu Jun 10 04:09:20 2021 +0200
MATH-1603: Userguide update.
---
src/site/xdoc/userguide/distribution.xml | 27 +++
src/site/xdoc/userguide/index.xml | 9 +-
src/site/xdoc/userguide/random.xml | 315 ++++++++-----------------------
3 files changed, 112 insertions(+), 239 deletions(-)
diff --git a/src/site/xdoc/userguide/distribution.xml b/src/site/xdoc/userguide/distribution.xml
index 74a3e3d..7d5c723 100644
--- a/src/site/xdoc/userguide/distribution.xml
+++ b/src/site/xdoc/userguide/distribution.xml
@@ -58,6 +58,33 @@
can make a difference when <code>p</code> is an attained value of the distribution.
</p>
</subsection>
+
+ <subsection name="8.2 Generating data like an input file"
+ href="empirical">
+ <p>
+ Using the <code>EmpiricalDistribution</code> class, you can generate data based on
+ the values in an input file:
+
+ <source>
+int binCount = 500;
+EmpiricalDistribution empDist = new EmpiricalDistribution(binCount);
+empDist.load("data.txt");
+RealDistribution.Sampler sampler = empDist.createSampler(RandomSource.MT.create());
+double value = sampler.nextDouble(); </source>
+
+ The entire input file is read and a probability density function is estimated
+ based on data from the file.
+ The estimation method is essentially the
+ <a href="http://nedwww.ipac.caltech.edu/level5/March02/Silverman/Silver2_6.html">
+ Variable Kernel Method</a> with Gaussian smoothing.
+ The created sampler will return random values whose probability distribution
+ matches the empirical distribution (i.e. if you generate a large number of
+ such values, their distribution should "look like" the distribution of the
+ values in the input file.
+ The values are not stored in memory in this case either, so there is no limit to the
+ size of the input file.
+ </p>
+ </subsection>
</section>
</body>
</document>
diff --git a/src/site/xdoc/userguide/index.xml b/src/site/xdoc/userguide/index.xml
index 2608aa8..992efdc 100644
--- a/src/site/xdoc/userguide/index.xml
+++ b/src/site/xdoc/userguide/index.xml
@@ -50,12 +50,8 @@
<li><a href="random.html">2. Data Generation</a>
<ul>
<li><a href="random.html#a2.1_Overview">2.1 Overview</a></li>
- <li><a href="random.html#a2.2_Random_numbers">2.2 Random numbers</a></li>
- <li><a href="random.html#a2.3_Random_Vectors">2.3 Random Vectors</a></li>
- <li><a href="random.html#a2.4_Random_Strings">2.4 Random Strings</a></li>
- <li><a href="random.html#a2.5_Random_permutations_combinations_sampling">2.5 Random permutations, combinations, sampling</a></li>
- <li><a href="random.html#a2.6_Generating_data_like_an_input_file">2.6 Generating data 'like' an input file</a></li>
- <li><a href="random.html#a2.7_PRNG_Pluggability">2.7 PRNG Pluggability</a></li>
+ <li><a href="random.html#a2.2_Correlated_random_vectors">2.2 Correlated random vectors</a></li>
+ <li><a href="random.html#a2.3_Low_discrepancy_sequences">2.3 Low discrepancy sequences</a></li>
</ul></li>
<li><a href="linear.html">3. Linear Algebra</a>
<ul>
@@ -103,6 +99,7 @@
<li><a href="distribution.html">8. Probability Distributions</a>
<ul>
<li><a href="distribution.html#a8.1_Overview">8.1 Overview</a></li>
+ <li><a href="distribution.html#a8.2_Generating_data_like_an_input_file">8.2 Generating data 'like' an input file</a></li>
</ul></li>
<li><a href="fraction.html">9. Fractions</a>
<ul>
diff --git a/src/site/xdoc/userguide/random.xml b/src/site/xdoc/userguide/random.xml
index 868c96a..cf68305 100644
--- a/src/site/xdoc/userguide/random.xml
+++ b/src/site/xdoc/userguide/random.xml
@@ -28,181 +28,100 @@
<section name="2 Data Generation">
-<subsection name="2.1 Overview"
- href="overview">
+ <subsection name="2.1 Overview"
+ href="overview">
<p>
- The Commons Math <a href="../apidocs/org/apache/commons/math4/random/package-summary.html">o.a.c.m.random</a>
- package includes utilities for
- <ul>
- <li>generating random numbers</li>
- <li>generating random vectors</li>
- <li>generating random strings</li>
- <li>generating cryptographically secure sequences of random numbers or
- strings</li>
- <li>generating random samples and permutations</li>
- <li>analyzing distributions of values in an input file and generating
- values "like" the values in the file</li>
- <li>generating data for grouped frequency distributions or
- histograms</li>
- </ul></p>
+ Utilities in package <a href="../apidocs/org/apache/commons/math4/legacy/random/package-summary.html">
+ o.a.c.m.legacy.random</a> often uses an underlying "source of randomness": A pseudo-random
+ number generator (PRNG) that produces sequences of numbers that are uniformly distributed
+ within their range.
+ Commons Math depends on <a href="http://commons.apache.org/rng">Commons RNG</a> for the
+ PRNG implementations.
+ </p>
+ </subsection>
+
+ <subsection name="2.2 Correlated random vectors"
+ href="vectors">
<p>
- These utilities rely on an underlying "source of randomness", which in most
- cases is a pseudo-random number generator (PRNG) that produces sequences
- of numbers that are uniformly distributed within their range.
- Commons Math depends on <a href="http://commons.apache.org/rng">Commons Rng</a>
- for the PRNG implementations.
+ Some algorithms require random vectors instead of random scalars.
+ When the components of these vectors are uncorrelated, they may be generated
+ simply one at a time and packed together in the vector.
</p>
<p>
- A PRNG algorithm is often deterministic, i.e. it produces the same sequence
- when initialized with the same "seed".
- This property is important for some applications like Monte-Carlo simulations,
- but makes such a PRNG often unsuitable for cryptographic purposes.
+ When the components are correlated however, generating them is more difficult.
+ The <a href="../apidocs/org/apache/commons/math4/legacy/random/CorrelatedVectorFactory.html">
+ CorrelatedVectorFactory</a> class provides this service.
+ In this case, a complete covariance matrix must be provided (instead of a
+ simple standard deviations vector) gathering both the variance and the
+ correlation information of the probability law.
+ </p>
+ <p>
+ The main use for correlated random vector generation is for Monte-Carlo
+ simulation of physical problems with several variables, for example to
+ generate error vectors to be added to a nominal vector. A particularly
+ common case is when the generated vector should be drawn from a <a
+ href="http://en.wikipedia.org/wiki/Multivariate_normal_distribution">
+ Multivariate Normal Distribution</a>.
</p>
-</subsection>
-
-<subsection name="2.2 Random Deviates"
- href="deviates">
- <p>
- <dl>
- <dt>Random sequence of numbers from a probability distribution</dt>
- <dd>
- There is no such thing as a single "random number." What can be
- generated are <i>sequences</i> of numbers that appear to be random. When
- using the built-in JDK function <code>Math.random()</code>, sequences of
- values generated follow the
- <a href="http://www.itl.nist.gov/div898/handbook/eda/section3/eda3662.htm">
- Uniform Distribution</a>, which means that the values are evenly spread
- over the interval between 0 and 1, with no sub-interval having a greater
- probability of containing generated values than any other interval of the
- same length. The mathematical concept of a
- <a href="http://www.itl.nist.gov/div898/handbook/eda/section3/eda36.htm">
- probability distribution</a> basically amounts to asserting that different
- ranges in the set of possible values of a random variable have
- different probabilities of containing the value. Commons Math supports
- generating random sequences from each of the distributions defined in the
- <a href="../apidocs/org/apache/commons/math4/distribution/package-summary.html">
- o.a.c.m.distribution</a> package.
- Please refer to the <a href="../distribution.html">specific documentation</a>
- for more details.
- </dd>
-
- <dt>Cryptographically secure random sequences</dt>
- <dd>
- It is possible for a sequence of numbers to appear random, but
- nonetheless to be predictable based on the algorithm used to generate the
- sequence.
- When in addition to randomness, strong unpredictability is
- required, a
- <a href="http://www.wikipedia.org/wiki/Cryptographically_secure_pseudo-random_number_generator">
- secure random number generator</a>
- should be used to generate values (or strings), for example an instance of
- the JDK-provided <code>SecureRandom</code> generator.
- In general, such secure generator produce sequence based on a source of
- true randomness, and sequences started with the same seed will diverge.
-
- The <a href="../apidocs/org/apache/commons/math4/random/RandomUtils.html">RandomUtils</a>
- class provides a method for wrapping a <code>java.util.Random</code> or
- <code>java.security.SecureRandom</code> instance in an object that implements
- the <a href="http://commons.apache.org/proper/commons-rng/apidocs/org/apache/commons/rng/UniformRandomProvider.html">
- UniformRandomProvider</a> interface:
- <source>
-UniformRandomProvider rg = RandomUtils.asUniformRandomProvider(new java.security.SecureRandom());
-</source>
- </dd>
- </dl>
- </p>
-</subsection>
-<subsection name="2.3 Random Vectors"
- href="vectors">
- <p>
- Some algorithms require random vectors instead of random scalars. When the
- components of these vectors are uncorrelated, they may be generated simply
- one at a time and packed together in the vector. The <a
- href="../apidocs/org/apache/commons/math4/random/UncorrelatedRandomVectorGenerator.html">
- UncorrelatedRandomVectorGenerator</a> class simplifies this
- process by setting the mean and deviation of each component once and
- generating complete vectors. When the components are correlated however,
- generating them is much more difficult. The <a href="../apidocs/org/apache/commons/math4/random/CorrelatedRandomVectorGenerator.html">
- CorrelatedRandomVectorGenerator</a> class provides this service. In this
- case, the user must set up a complete covariance matrix instead of a simple
- standard deviations vector. This matrix gathers both the variance and the
- correlation information of the probability law.
- </p>
- <p>
- The main use for correlated random vector generation is for Monte-Carlo
- simulation of physical problems with several variables, for example to
- generate error vectors to be added to a nominal vector. A particularly
- common case is when the generated vector should be drawn from a <a
- href="http://en.wikipedia.org/wiki/Multivariate_normal_distribution">
- Multivariate Normal Distribution</a>.
- </p>
+ <p>
+ Generating random vectors from a bivariate normal distribution:
- <p><dl>
- <dt>Generating random vectors from a bivariate normal distribution</dt><dd>
- <source>
-// Import common PRNG interface and factory class that instantiates the PRNG.
+ <source>
+import java.util.function.Supplier;
import org.apache.commons.rng.UniformRandomProvider;
import org.apache.commons.rng.RandomSource;
-// Create (and possibly seed) a PRNG (could use any of the CM-provided generators).
+// Import common PRNG interface and factory class that instantiates the PRNG.
+// Create (and possibly seed) a PRNG.
long seed = 17399225432L; // Fixed seed means same results every time
-UniformRandomProvider rg = RandomSource.create(RandomSource.MT, seed);
-
-// Create a GaussianRandomGenerator using "rg" as its source of randomness.
-GaussianRandomGenerator rawGenerator = new GaussianRandomGenerator(rg);
+UniformRandomProvider rng = RandomSource.create(RandomSource.MT, seed);
-// Create a CorrelatedRandomVectorGenerator using "rawGenerator" for the components.
-CorrelatedRandomVectorGenerator generator =
- new CorrelatedRandomVectorGenerator(mean, covariance, 1.0e-12 * covariance.getNorm(), rawGenerator);
+// Create a a factory of correlated vectors.
+CorrelatedVectorFactory factory = new CorrelatedVectorFactory(mean, covariance, 1e-12);
+Supplier<double[]> generator = factory.gaussian(rng);
// Use the generator to generate correlated vectors.
-double[] randomVector = generator.nextVector();
+double[] randomVector = generator.get();
... </source>
- The <code>mean</code> argument is a <code>double[]</code> array holding the means
- of the random vector components. In the bivariate case, it must have length 2.
- The <code>covariance</code> argument is a <code>RealMatrix</code>, which has to
- be 2 x 2.
- The main diagonal elements are the variances of the vector components and the
- off-diagonal elements are the covariances.
- For example, if the means are 1 and 2 respectively, and the desired standard deviations
- are 3 and 4, respectively, then we need to use
- <source>
+ The <code>mean</code> argument is a <code>double[]</code> array holding the means
+ of the random vector components. In the bivariate case, it must have length 2.
+ The <code>covariance</code> argument is a <code>RealMatrix</code>, which has to
+ be 2 x 2.
+ The main diagonal elements are the variances of the vector components and the
+ off-diagonal elements are the covariances.
+ For example, if the means are 1 and 2 respectively, and the desired standard deviations
+ are 3 and 4, respectively, then we need to use
+
+ <source>
double[] mean = {1, 2};
double[][] cov = {{9, c}, {c, 16}};
-RealMatrix covariance = MatrixUtils.createRealMatrix(cov); </source>
- where "c" is the desired covariance. If you are starting with a desired correlation,
- you need to translate this to a covariance by multiplying it by the product of the
- standard deviations. For example, if you want to generate data that will give Pearson's
- R of 0.5, you would use c = 3 * 4 * 0.5 = 6.
- </dd>
- </dl></p>
- <p>
- In addition to multivariate normal distributions, correlated vectors from multivariate uniform
- distributions can be generated by creating a
- <a href="../apidocs/org/apache/commons/math4/random/UniformRandomGenerator.html">UniformRandomGenerator</a>
- in place of the
- <code>GaussianRandomGenerator</code> above. More generally, any
- <a href="../apidocs/org/apache/commons/math4/random/NormalizedRandomGenerator.html">NormalizedRandomGenerator</a>
- may be used.
- </p>
+RealMatrix covariance = MatrixUtils.createRealMatrix(cov);
+ </source>
+ where "c" is the desired covariance. If you are starting with a desired correlation,
+ you need to translate this to a covariance by multiplying it by the product of the
+ standard deviations. For example, if you want to generate data that will give Pearson's
+ R of 0.5, you would use c = 3 * 4 * 0.5 = 6.
+ </p>
+ </subsection>
- <p><dl>
- <dt>Low discrepancy sequences</dt>
- <dd>
- There exist several quasi-random sequences with the property that for all values of N, the subsequence
- x<sub>1</sub>, ..., x<sub>N</sub> has low discrepancy, which results in equi-distributed samples.
- While their quasi-randomness makes them unsuitable for most applications (i.e. the sequence of values
- is completely deterministic), their unique properties give them an important advantage for quasi-Monte Carlo simulations.<br/>
- Currently, the following low-discrepancy sequences are supported:
- <ul>
- <li><a href="../apidocs/org/apache/commons/math4/random/SobolSequenceGenerator.html">
- Sobol sequence</a> (pre-configured up to dimension 1000)</li>
- <li><a href="../apidocs/org/apache/commons/math4/random/HaltonSequenceGenerator.html">
- Halton sequence</a> (pre-configured up to dimension 40)</li>
- </ul>
- <source>
+ <subsection name="2.3 Low discrepancy sequences"
+ href="lowdiscrepancy">
+ <p>
+ There exist several quasi-random sequences with the property that for all values of N, the subsequence
+ x<sub>1</sub>, ..., x<sub>N</sub> has low discrepancy, which results in equi-distributed samples.
+ While their quasi-randomness makes them unsuitable for most applications (i.e. the sequence of values
+ is completely deterministic), their unique properties give them an important advantage for quasi-Monte Carlo simulations.<br/>
+ Currently, the following low-discrepancy sequences are supported:
+ <ul>
+ <li><a href="../apidocs/org/apache/commons/math4/legacy/random/SobolSequenceGenerator.html">
+ Sobol sequence</a> (pre-configured up to dimension 1000)</li>
+ <li><a href="../apidocs/org/apache/commons/math4/legacy/random/HaltonSequenceGenerator.html">
+ Halton sequence</a> (pre-configured up to dimension 40)</li>
+ </ul>
+
+ <source>
// Create a Sobol sequence generator for 2-dimensional vectors
RandomVectorGenerator generator = new SobolSequence(2);
@@ -210,85 +129,15 @@ RandomVectorGenerator generator = new SobolSequence(2);
double[] randomVector = generator.nextVector();
... </source>
- The figure below illustrates the unique properties of low-discrepancy sequences when
- generating N samples in the interval [0, 1]. Roughly speaking, such sequences "fill"
- the respective space more evenly which leads to faster convergence in quasi-Monte Carlo
- simulations.<br/>
- <img src="../images/userguide/low_discrepancy_sequences.png"
- alt="Comparison of low-discrepancy sequences"/>
- </dd>
- </dl></p>
-
-</subsection>
-
-<subsection name="2.4 Random Strings"
- href="strings">
- <p>
- The method <code>nextHexString</code> in
- <a href="../apidocs/org/apache/commons/math4/random/RandomUtils.DataGenerator.html">
- RandomUtils.DataGenerator</a> can be used to generate random strings of
- hexadecimal characters.
- It produces sequences of strings with good dispersion properties.
- A string can be generated in two different ways, depending on the value
- of the boolean argument passed to the method (see the Javadoc for more
- details).
+ The figure below illustrates the unique properties of low-discrepancy sequences when
+ generating N samples in the interval [0, 1]. Roughly speaking, such sequences "fill"
+ the respective space more evenly which leads to faster convergence in quasi-Monte Carlo
+ simulations.<br/>
+ <img src="../images/userguide/low_discrepancy_sequences.png"
+ alt="Comparison of low-discrepancy sequences"/>
</p>
-</subsection>
-
-<subsection name="2.5 Random Permutations, Combinations, Sampling"
- href="combinatorics">
- <p>
- To select a random sample of objects in a collection, you can use the
- <code>nextSample</code> method provided by in
- <a href="../apidocs/org/apache/commons/math4/random/RandomUtils.DataGenerator.html">
- RandomUtils.DataGenerator</a>.
- Specifically, if <code>c</code> is a <code>java.util.Collection<T></code>
- containing at least <code>k</code> objects, and <code>randomData</code> is a
- <code>RandomUtils.DataGenerator</code> instance <code>randomData.nextSample(c, k)</code>
- will return an <code>List<T></code> instance of size <code>k</code>
- consisting of elements randomly selected from the collection.
- If <code>c</code> contains duplicate references, there may be duplicate
- references in the returned array; otherwise returned elements will be
- unique (i.e. the sampling is without replacement among the object
- references in the collection).
- </p>
-
- <p>
- If <code>n</code> and <code>k</code> are integers with <code>k < n</code>, then
- <code>randomData.nextPermutation(n, k)</code> returns an <code>int[]</code>
- array of length <code>k</code> whose whose entries are selected randomly,
- without repetition, from the integers <code>0</code> through
- <code>n-1</code> (inclusive).
- </p>
-</subsection>
-
-<subsection name="2.6 Generating data like an input file"
- href="empirical">
- <p>
- Using the <code>EmpiricalDistribution</code> class, you can generate data based on
- the values in an input file:
- <dl>
- <source>
-int binCount = 500;
-EmpiricalDistribution empDist = new EmpiricalDistribution(binCount);
-empDist.load("data.txt");
-RealDistribution.Sampler sampler = empDist.createSampler(RandomSource.create(RandomSource.MT));
-double value = sampler.nextDouble(); </source>
- The entire input file is read and a probability density function is estimated
- based on data from the file.
- The estimation method is essentially the
- <a href="http://nedwww.ipac.caltech.edu/level5/March02/Silverman/Silver2_6.html">
- Variable Kernel Method</a> with Gaussian smoothing.
- The created sampler will return random values whose probability distribution
- matches the empirical distribution (i.e. if you generate a large number of
- such values, their distribution should "look like" the distribution of the
- values in the input file.
- The values are not stored in memory in this case either, so there is no limit to the
- size of the input file.
- </dl>
- </p>
-</subsection>
+ </subsection>
</section>