You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@commons.apache.org by Alex Herbert <al...@gmail.com> on 2020/01/17 12:15:40 UTC

[rng] FiniteDoubleSampler to sample uniformly from a range of representable double values

The UniformRandomProvider and ContinuousUniformSampler can create 
doubles uniformly in a range [0, 1) and [x, y).

I would like to create a sampler that can create doubles with random 
mantissa bits and a specified range of exponents. These would not follow 
a standard distribution but would be distributed according to the IEEE 
754 floating-point "double format" bit layout.

The sampler would ensure all bits of the mantissa are uniformly sampled 
and the exponent range is uniformly sampled. It could be used for 
example to create random double data in a specified range for testing.

Following from UniformRandomProvider I propose to remove the sign bit to 
allow a maximum range sampler from [0, Double.MAX_VALUE]. Thus we have 
the API:

public final class FiniteDoubleSampler {

/**
  * Creates a new finite double sampler.
  *
  * <p>This will return double values from the entire exponent range of 
a finite
  * {@code double} including sub-normal numbers. The value will be unsigned.
  *
  * @param rng Generator of uniformly distributed random numbers.
  * @return Sampler.
  * @since 1.4
  */
public static SharedStateContinuousSampler of(UniformRandomProvider rng);

/**
  * Creates a new finite double sampler.
  *
  * <p>This will return double values from the specified exponent range 
of a finite
  * {@code double}. This assumes all sub-normal numbers are identified 
with the exponent -1023.
  * The value will be unsigned.
  *
  * @param rng Generator of uniformly distributed random numbers.
  * @param minExponent Minimum exponent
  * @param maxExponent Maximum exponent
  * @see Double#MIN_EXPONENT
  * @see Double#MAX_EXPONENT
  * @throws IllegalArgumentException If the exponents are not in the 
range -1023 inclusive
  * to 1023 inclusive; or the min exponent is not {@code <=} max exponent.
  * @return Sampler.
  * @since 1.4
  */
public static SharedStateContinuousSampler of(UniformRandomProvider rng,
                                               int minExponent,
                                               int maxExponent);
}

I have written many tests where I wanted full precision random mantissas 
in double values and wanted the doubles to represent all doubles that 
could occur. Thus randomly sampling from the IEEE 754 representation 
seems to be more generic than for example rng.nextDouble() * constant. 
For example:

// Random numbers: [0, Double.MAX_VALUE]
FiniteDoubleSampler.of(rng);
FiniteDoubleSampler.of(rng, -1023, 1023);
// Random sub-normal numbers: [0, Double.MIN_NORMAL)
FiniteDoubleSampler.of(rng, -1023, -1023);
// Random numbers that are close to overflow: (Double.MAX_VALUE/2, 
Double.MAX_VALUE]
FiniteDoubleSampler.of(rng, 1023, 1023);
// Random numbers in the range [1, 32)
FiniteDoubleSampler.of(rng, 0, 4);

Thoughts on this?

Alex

Re: [rng] FiniteDoubleSampler to sample uniformly from a range of representable double values

Posted by Alex Herbert <al...@gmail.com>.

On 17/01/2020 17:21, Alex Herbert wrote:
>
> A uniform sampler of the IEEE 754 finite doubles would sample from a 
> log-uniform distribution as the limiting distribution and I don't 
> think that is actually what people require when creating random double 
> data. So either it: (a) is discarded as an idea; (b) is well 
> documented and put in examples; (c) is actually formalised to 
> correctly sample from the log-uniform distribution using the PDF:
>
> 1 / (x (log(b) - log(a)))
>
> for x in the interval [a, b] from a uniform deviate (U) in the range 
> of the natural logarithm of the interval:
>
> X = exp(U(log(a), log(b)))
>
I think that adding a LogUniformSampler may be of use.

It would allow creation of data approximately uniform in the 
distribution of exponents. The numbers would be uniform on a log scale. 
The implementation is simple and I don't see the harm in adding it since 
it is a valid distribution with known PDF and CDF.

Likewise the LogUniformDistribution can be added to commons-statistics 
distribution. There already is a vaguely related 
LogNormalDistribution/NormalDistribution pair so this would be and the 
LogUniformDistribution/UniformContinuousDistribution pair. I don't see 
the need to name it LogUniformContinuousDistribution as there is no 
discrete version that I could find documented.

Alex

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: [rng] FiniteDoubleSampler to sample uniformly from a range of representable double values

Posted by Alex Herbert <al...@gmail.com>.

On 17/01/2020 13:33, Gilles Sadowski wrote:
> Hi.
>
> Le ven. 17 janv. 2020 à 13:15, Alex Herbert <al...@gmail.com> a écrit :
>> The UniformRandomProvider and ContinuousUniformSampler can create
>> doubles uniformly in a range [0, 1) and [x, y).
>>
>> I would like to create a sampler that can create doubles with random
>> mantissa bits and a specified range of exponents. These would not follow
>> a standard distribution but would be distributed according to the IEEE
>> 754 floating-point "double format" bit layout.
>>
>> The sampler would ensure all bits of the mantissa are uniformly sampled
>> and the exponent range is uniformly sampled. It could be used for
>> example to create random double data in a specified range for testing.
> Are there use-cases other than providing inputs for unit tests?

Maybe. I haven't got one though.

However to use it from other places it would have to be in a distributed 
maven artifact. So either it goes in commons-rng-sampling or perhaps 
commons-rng-examples-sampling.

>> Following from UniformRandomProvider I propose to remove the sign bit to
>> allow a maximum range sampler from [0, Double.MAX_VALUE].
> Is there a trade-off to allowing generation of the whole range of "double"s?
No. I can change the API so the sign bit is optionally removed. At least 
in some tests it is advantageous to know the data will be randomly 
signed or only unsigned.
>> Thus we have
>> the API:
>>
>> public final class FiniteDoubleSampler {
>>
>> /**
>>    * Creates a new finite double sampler.
>>    *
>>    * <p>This will return double values from the entire exponent range of
>> a finite
>>    * {@code double} including sub-normal numbers. The value will be unsigned.
>>    *
>>    * @param rng Generator of uniformly distributed random numbers.
>>    * @return Sampler.
>>    * @since 1.4
>>    */
>> public static SharedStateContinuousSampler of(UniformRandomProvider rng);
>>
>> /**
>>    * Creates a new finite double sampler.
>>    *
>>    * <p>This will return double values from the specified exponent range
>> of a finite
>>    * {@code double}. This assumes all sub-normal numbers are identified
>> with the exponent -1023.
>>    * The value will be unsigned.
>>    *
>>    * @param rng Generator of uniformly distributed random numbers.
>>    * @param minExponent Minimum exponent
>>    * @param maxExponent Maximum exponent
>>    * @see Double#MIN_EXPONENT
>>    * @see Double#MAX_EXPONENT
>>    * @throws IllegalArgumentException If the exponents are not in the
>> range -1023 inclusive
>>    * to 1023 inclusive; or the min exponent is not {@code <=} max exponent.
>>    * @return Sampler.
>>    * @since 1.4
>>    */
>> public static SharedStateContinuousSampler of(UniformRandomProvider rng,
>>                                                 int minExponent,
>>                                                 int maxExponent);
>> }
>>
>> I have written many tests where I wanted full precision random mantissas
>> in double values and wanted the doubles to represent all doubles that
>> could occur.
> How do you ensure it (short of listing them all)?

You cannot practically list all doubles. What you want is a 
representative set of all doubles. I think this idea satisfies that as 
doubles are not uniform on a linear scale.

>
>> Thus randomly sampling from the IEEE 754 representation
>> seems to be more generic than for example rng.nextDouble() * constant.
> What's the advantage of the former over the latter?

Given that doubles are not uniform on a linear scale if you use 
rng.nextDouble() * constant you are biasing your samples to higher 
exponents. This may or may not be what you want.

If we examine nextDouble() it will uniformly sample all the 2^53 dyadic 
rationals in the interval [0, 1). These have a large variety of 
exponents. I do not know the distribution of only the mantissa bits but 
I would guess that they may not be uniform in order to fall exactly on 
the spacing interval 1/2^53. For example the first 4 numbers:

1/2^53
2/2^53 = 1/2^52
3/2^53 = 1/2^52 + 1/2^53
4/2^53 = 1/2^51

would have regular mantissas using only 0 or 1 bits. The mantissa is the 
same for 1, 2, 4 and only different for 3.

The interval [0, 1) can be divided into [0, 0.5) and [0.5, 1). The 
second interval has only 1 exponent. So half the 2^53 random numbers 
must be represented here using the mantissa and it requires all but the 
bit combinations in the mantissa. But in the range below 0.5 the 
mantissa bits will begin to be more and more sparsely used as the 
representation shifts to using the exponent in the range -2 to -53.

What happens when you multiply by a constant?

IIUC this is equivalent:

x * y = s * m * 2^(e-1075)  *  s * m * 2^(e-1075)

where:

  long bits = Double.doubleToLongBits(x or y);
  int s = ((bits >> 63) == 0) ? 1 : -1;
  int e = (int)((bits >> 52) & 0x7ffL);
  long m = (e == 0) ?
                  (bits & 0xfffffffffffffL) << 1 :
                  (bits & 0xfffffffffffffL) | 0x10000000000000L;

So ignoring the sign and exponent (which will just change in magnitude) 
the result can be computed by multiplying the two normalised mantissas, e.g.

// Conditioned so that the exponent of the result is the same and can be 
ignored
final double x = 1.23;
final double y = 1.56;
// Convert to a normalised mantissa
BigInteger bx = BigInteger.valueOf((Double.doubleToLongBits(x) & 
0xfffffffffffffL) | 0x10000000000000L);
BigInteger by = BigInteger.valueOf((Double.doubleToLongBits(y) & 
0xfffffffffffffL) | 0x10000000000000L);
// multiply
BigInteger result = bx.multiply(by);
// Convert 105-bit result (53-bit * 53-bit) back to 53 bit result and 
remove leading bit
// to create 52 bit mantissa result (without assumed leading bit)
BigInteger xy = result.shiftRight(52).clearBit(52);
// Round to nearest. If the most-significant bit that was dropped was 
odd then round up.
// In this case we ignore exactly half way for simplicity.
if (result.testBit(51)) {
   xy = xy.add(BigInteger.ONE);
}
Assertions.assertEquals(x * y,
     Double.longBitsToDouble(xy.longValue() | (Math.getExponent(x) + 
1023L) << 52));

The result is that we know that for x * y like the rng.nextDouble() * 
constant the result can be computed from the normalised mantissas 
multiplied in long arithmetic and the mantissa is set using the upper 53 
bits of the result.

This (x * y >> m) is a multiply congruential generator. The amount of 
randomness in the result then becomes determined by quality of the 
multiplier. If it has a mantissa that is not a good multiplier then the 
randomness of the mantissa of the output doubles will not be very 
random. This is assuming that the randomness of the mantissa of x is 
good, which it may not be.

>
>> For example:
>>
>> // Random numbers: [0, Double.MAX_VALUE]
>> FiniteDoubleSampler.of(rng);
>> FiniteDoubleSampler.of(rng, -1023, 1023);
>> // Random sub-normal numbers: [0, Double.MIN_NORMAL)
>> FiniteDoubleSampler.of(rng, -1023, -1023);
>> // Random numbers that are close to overflow: (Double.MAX_VALUE/2,
>> Double.MAX_VALUE]
>> FiniteDoubleSampler.of(rng, 1023, 1023);
>> // Random numbers in the range [1, 32)
>> FiniteDoubleSampler.of(rng, 0, 4);
> Will the distribution be the same as for "32 * rng.nextDouble()"?

No. rng.nextDouble() will only have 2^53 possible numbers uniformly 
distributed. They will have mantissa bits subject to the distribution 
required to created the dyadic rationals. The total possible in 32 * 
rng.nextDouble() is 2^5 * 2^53 = 2^58. The distribution will be heavily 
weighted towards smaller numbers.

A sampler of doubles with a fixed exponent would produce uniform output:

s * m * 2^(e-1075)

Since only m is changing linearly then the output is uniform.

As soon as the exponent is also varied then samples are distributed 
uniformly within the same exponent but biased to smaller numbers. Thus 
the distribution of numbers over the range [a, b) is a log-uniform 
distribution [1, 2]. However within each block of numbers with the same 
exponent the distribution is uniform and so the log-uniform is the 
limiting distribution.

So on rationalisation perhaps this type of sampler is a bad idea as it 
requires an understanding that the distribution over a range of 
exponents will not produce numbers distributed uniformly over the range.

An alternative is to have a sampler that outputs a random double only 
for a specified exponent, for example in the range [0.5, 1), or [1, 2). 
This uses every possible mantissa bit. This would be a better source of 
random doubles for a specific range. When multiplied by a constant the 
result may not be perfect sampling of the mantissa bits any more (unless 
the constant is a power of 2).

Generation of nextDouble() and nextInterval() is of the form:

long bits = rng.nextLong();
nextDouble() = (bits >>> 11) * 1.0e-53;
nextInterval() = Double.longBitsToDouble((bits >>> 12) | (1023 << 52)); 
// for [1, 2)

So generation is faster. If you want to create a uniform deviate from 
this you have to subtract the range. This was looked at in the JMH 
examples class FloatingPointGenerationPerformance benchmark and is slower:

return Double.longBitsToDouble(0x3ffL << 52 | source.nextLong() >>> 12) 
- 1.0;

I think the result is would be the same 2^53 dyadic rationals. This is 
partially tested in the core unit test for NumberFactory but it does not 
go as far as looking at the uniformity of these alternative methods.

So the conclusion is that a uniform interval sampler may be of some use 
if you want uniform numbers in an interval that is defined by [a/2, a). 
I will have a think about whether this is useful. It applies if you want 
random big numbers multiplied by random small ones as I am currently 
testing with LinearCombination in numbers. So I will try the idea there 
to create some random data for testing.

A uniform sampler of the IEEE 754 finite doubles would sample from a 
log-uniform distribution as the limiting distribution and I don't think 
that is actually what people require when creating random double data. 
So either it: (a) is discarded as an idea; (b) is well documented and 
put in examples; (c) is actually formalised to correctly sample from the 
log-uniform distribution using the PDF:

1 / (x (log(b) - log(a)))

for x in the interval [a, b] from a uniform deviate (U) in the range of 
the natural logarithm of the interval:

X = exp(U(log(a), log(b)))

Alex

[1] 
https://docs.scipy.org/doc/scipy-1.4.0/reference/tutorial/stats/continuous_loguniform.html

[2] https://en.wikipedia.org/wiki/Reciprocal_distribution

>
> Regards,
> Gilles
>
>> Thoughts on this?
>>
>> Alex
>>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: [rng] FiniteDoubleSampler to sample uniformly from a range of representable double values

Posted by Gilles Sadowski <gi...@gmail.com>.

Hi.

Le ven. 17 janv. 2020 à 13:15, Alex Herbert <al...@gmail.com> a écrit :
>
> The UniformRandomProvider and ContinuousUniformSampler can create
> doubles uniformly in a range [0, 1) and [x, y).
>
> I would like to create a sampler that can create doubles with random
> mantissa bits and a specified range of exponents. These would not follow
> a standard distribution but would be distributed according to the IEEE
> 754 floating-point "double format" bit layout.
>
> The sampler would ensure all bits of the mantissa are uniformly sampled
> and the exponent range is uniformly sampled. It could be used for
> example to create random double data in a specified range for testing.

Are there use-cases other than providing inputs for unit tests?

>
> Following from UniformRandomProvider I propose to remove the sign bit to
> allow a maximum range sampler from [0, Double.MAX_VALUE].

Is there a trade-off to allowing generation of the whole range of "double"s?

> Thus we have
> the API:
>
> public final class FiniteDoubleSampler {
>
> /**
>   * Creates a new finite double sampler.
>   *
>   * <p>This will return double values from the entire exponent range of
> a finite
>   * {@code double} including sub-normal numbers. The value will be unsigned.
>   *
>   * @param rng Generator of uniformly distributed random numbers.
>   * @return Sampler.
>   * @since 1.4
>   */
> public static SharedStateContinuousSampler of(UniformRandomProvider rng);
>
> /**
>   * Creates a new finite double sampler.
>   *
>   * <p>This will return double values from the specified exponent range
> of a finite
>   * {@code double}. This assumes all sub-normal numbers are identified
> with the exponent -1023.
>   * The value will be unsigned.
>   *
>   * @param rng Generator of uniformly distributed random numbers.
>   * @param minExponent Minimum exponent
>   * @param maxExponent Maximum exponent
>   * @see Double#MIN_EXPONENT
>   * @see Double#MAX_EXPONENT
>   * @throws IllegalArgumentException If the exponents are not in the
> range -1023 inclusive
>   * to 1023 inclusive; or the min exponent is not {@code <=} max exponent.
>   * @return Sampler.
>   * @since 1.4
>   */
> public static SharedStateContinuousSampler of(UniformRandomProvider rng,
>                                                int minExponent,
>                                                int maxExponent);
> }
>
> I have written many tests where I wanted full precision random mantissas
> in double values and wanted the doubles to represent all doubles that
> could occur.

How do you ensure it (short of listing them all)?

> Thus randomly sampling from the IEEE 754 representation
> seems to be more generic than for example rng.nextDouble() * constant.

What's the advantage of the former over the latter?

> For example:
>
> // Random numbers: [0, Double.MAX_VALUE]
> FiniteDoubleSampler.of(rng);
> FiniteDoubleSampler.of(rng, -1023, 1023);
> // Random sub-normal numbers: [0, Double.MIN_NORMAL)
> FiniteDoubleSampler.of(rng, -1023, -1023);
> // Random numbers that are close to overflow: (Double.MAX_VALUE/2,
> Double.MAX_VALUE]
> FiniteDoubleSampler.of(rng, 1023, 1023);
> // Random numbers in the range [1, 32)
> FiniteDoubleSampler.of(rng, 0, 4);

Will the distribution be the same as for "32 * rng.nextDouble()"?

Regards,
Gilles

> Thoughts on this?
>
> Alex
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org