You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@commons.apache.org by Alexey Dievsky <di...@gmail.com> on 2015/11/19 10:34:46 UTC

Fwd: [Math] Tabulated logarithmic factorial

Hi,

I recently submitted an issue to JIRA (
https://issues.apache.org/jira/browse/MATH-1293) and received an advice to
discuss it here.

In short, the current implementation of CombinatoricsUtils#factorialLog
employs direct summation and thus has linear complexity (O(n) additions and
O(n) logs). I suggest replacing it with a tabulated lookup. This solution
will need to allocate some additional memory, but the speed-up for larger n
would be tremendous. For the values that are too large to be looked up,
equivalent Gamma#logGamma is still much faster than the existing algorithm.

I attached a proposed patch to the issue. However, there are a lot of
details that need to be worked out, first and foremost whether such
optimization is at all worth it. Thus I ask for your opinion. Please let me
know what think.

Sincerely,
Aleksei Dievskii.

Re: Fwd: [Math] Tabulated logarithmic factorial

Posted by Alexey Dievsky <di...@gmail.com>.
Hi,

just a quick update: assisted lookup is significantly faster than
Gamma#logGamma when the STEP parameter is about 8 or less.
So, the implementation variants seem to be the following:
1. leave one static lookup-based version of the method; requires fixing
some arbitrary constants and introducing increased memory footprint even
for those users who do not require fast factorial;
2. create a helper class on demand; requires more work on the part of the
user (not to mention knowing where to look for it), but provides a more
flexible solution.

Sincerely,
Aleksei Dievskii.

On Thu, Nov 19, 2015 at 12:29 PM, Alexey Dievsky <di...@gmail.com> wrote:

> Hi Gilles,
>
> thanks for your response. I've run a small JMH benchmark of both old
> (direct summation) and new (default 16KB tabulation) approaches, and here
> are the results:
>
> Benchmark                                           Mode  Samples
> Score      Error   Units
> o.a.c.m.s.FactorialLogBench.newFactorialLog100     thrpt       20
> 75066.697 ± 7122.980  ops/ms
> o.a.c.m.s.FactorialLogBench.newFactorialLog1000    thrpt       20
> 76905.243 ± 5965.782  ops/ms
> o.a.c.m.s.FactorialLogBench.newFactorialLog10K     thrpt       20
> 5770.486 ± 285.069  ops/ms
> o.a.c.m.s.FactorialLogBench.newFactorialLog100K    thrpt       20
> 6156.102 ± 278.842  ops/ms
> o.a.c.m.s.FactorialLogBench.oldFactorialLog100     thrpt       20
> 585.576 ±   26.802  ops/ms
> o.a.c.m.s.FactorialLogBench.oldFactorialLog1000    thrpt       20
> 29.883 ±    1.663  ops/ms
> o.a.c.m.s.FactorialLogBench.oldFactorialLog10K     thrpt       20
> 2.744 ±   0.095  ops/ms
> o.a.c.m.s.FactorialLogBench.oldFactorialLog100K    thrpt       20
> 0.264 ±   0.007  ops/ms
>
> Here *100 calculates the factorial of a random number in range [0..99],
> *1000 -- of a number in range [900..999], *10K -- of a number in range
> [9900..9999], and *100K -- of a number in range [99900..99999]. The default
> tabulated version covers the first two cases with a direct lookup, the
> third one with a use-the-closest-stored-value approach (about two logs, two
> multiplications and two additions on average), and the fourth one delegates
> to logGamma.
> As you can see, the tabulated (new) approach is about 130x faster on small
> numbers and the difference grows drastically from there on, up to about
> 23000x on fairly big numbers.
>
> Another interesting result is that logGamma doesn't seem to be slower than
> the assisted lookup, which possibly indicates that we can drop the assisted
> lookup altogether, reducing the memory footprint.
>
> Sincerely,
> Aleksei Dievskii.
>
> On Thu, Nov 19, 2015 at 11:18 AM, Gilles <gi...@harfang.homelinux.org>
> wrote:
>
>> Hello.
>>
>> On Thu, 19 Nov 2015 10:34:46 +0100, Alexey Dievsky wrote:
>>
>>> Hi,
>>>
>>> I recently submitted an issue to JIRA (
>>> https://issues.apache.org/jira/browse/MATH-1293) and received an advice
>>> to
>>> discuss it here.
>>>
>>> In short, the current implementation of CombinatoricsUtils#factorialLog
>>> employs direct summation and thus has linear complexity (O(n) additions
>>> and
>>> O(n) logs). I suggest replacing it with a tabulated lookup. This solution
>>> will need to allocate some additional memory, but the speed-up for
>>> larger n
>>> would be tremendous. For the values that are too large to be looked up,
>>> equivalent Gamma#logGamma is still much faster than the existing
>>> algorithm.
>>>
>>> I attached a proposed patch to the issue. However, there are a lot of
>>> details that need to be worked out, first and foremost whether such
>>> optimization is at all worth it.
>>>
>>
>> According to your statement above (significantly improved performance),
>> there
>> is actually no doubt that the feature should be offered to CM users.[1][2]
>> The main question is whether a single speed vs memory (vs possibly
>> increased
>> initialization time) trade-off is always acceptable.
>>
>> Hence at first sight, I'd prefer to provide a factory to create an
>> object (that will compute the factorial) with a user-defined trade-off.
>>
>> Regards,
>> Gilles
>>
>> [1] It would also be nice if you could provide a benchmark.
>> [2] A real-life use-case would also be nice to enhance the CM userguide.
>>
>> Thus I ask for your opinion. Please let me
>>> know what think.
>>>
>>> Sincerely,
>>> Aleksei Dievskii.
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>> For additional commands, e-mail: dev-help@commons.apache.org
>>
>>
>

Re: Fwd: [Math] Tabulated logarithmic factorial

Posted by Alexey Dievsky <di...@gmail.com>.
Hi Gilles,

thanks for your response. I've run a small JMH benchmark of both old
(direct summation) and new (default 16KB tabulation) approaches, and here
are the results:

Benchmark                                           Mode  Samples
Score      Error   Units
o.a.c.m.s.FactorialLogBench.newFactorialLog100     thrpt       20
75066.697 ± 7122.980  ops/ms
o.a.c.m.s.FactorialLogBench.newFactorialLog1000    thrpt       20
76905.243 ± 5965.782  ops/ms
o.a.c.m.s.FactorialLogBench.newFactorialLog10K     thrpt       20  5770.486
± 285.069  ops/ms
o.a.c.m.s.FactorialLogBench.newFactorialLog100K    thrpt       20  6156.102
± 278.842  ops/ms
o.a.c.m.s.FactorialLogBench.oldFactorialLog100     thrpt       20
585.576 ±   26.802  ops/ms
o.a.c.m.s.FactorialLogBench.oldFactorialLog1000    thrpt       20
29.883 ±    1.663  ops/ms
o.a.c.m.s.FactorialLogBench.oldFactorialLog10K     thrpt       20     2.744
±   0.095  ops/ms
o.a.c.m.s.FactorialLogBench.oldFactorialLog100K    thrpt       20     0.264
±   0.007  ops/ms

Here *100 calculates the factorial of a random number in range [0..99],
*1000 -- of a number in range [900..999], *10K -- of a number in range
[9900..9999], and *100K -- of a number in range [99900..99999]. The default
tabulated version covers the first two cases with a direct lookup, the
third one with a use-the-closest-stored-value approach (about two logs, two
multiplications and two additions on average), and the fourth one delegates
to logGamma.
As you can see, the tabulated (new) approach is about 130x faster on small
numbers and the difference grows drastically from there on, up to about
23000x on fairly big numbers.

Another interesting result is that logGamma doesn't seem to be slower than
the assisted lookup, which possibly indicates that we can drop the assisted
lookup altogether, reducing the memory footprint.

Sincerely,
Aleksei Dievskii.

On Thu, Nov 19, 2015 at 11:18 AM, Gilles <gi...@harfang.homelinux.org>
wrote:

> Hello.
>
> On Thu, 19 Nov 2015 10:34:46 +0100, Alexey Dievsky wrote:
>
>> Hi,
>>
>> I recently submitted an issue to JIRA (
>> https://issues.apache.org/jira/browse/MATH-1293) and received an advice
>> to
>> discuss it here.
>>
>> In short, the current implementation of CombinatoricsUtils#factorialLog
>> employs direct summation and thus has linear complexity (O(n) additions
>> and
>> O(n) logs). I suggest replacing it with a tabulated lookup. This solution
>> will need to allocate some additional memory, but the speed-up for larger
>> n
>> would be tremendous. For the values that are too large to be looked up,
>> equivalent Gamma#logGamma is still much faster than the existing
>> algorithm.
>>
>> I attached a proposed patch to the issue. However, there are a lot of
>> details that need to be worked out, first and foremost whether such
>> optimization is at all worth it.
>>
>
> According to your statement above (significantly improved performance),
> there
> is actually no doubt that the feature should be offered to CM users.[1][2]
> The main question is whether a single speed vs memory (vs possibly
> increased
> initialization time) trade-off is always acceptable.
>
> Hence at first sight, I'd prefer to provide a factory to create an
> object (that will compute the factorial) with a user-defined trade-off.
>
> Regards,
> Gilles
>
> [1] It would also be nice if you could provide a benchmark.
> [2] A real-life use-case would also be nice to enhance the CM userguide.
>
> Thus I ask for your opinion. Please let me
>> know what think.
>>
>> Sincerely,
>> Aleksei Dievskii.
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>

Re: Fwd: [Math] Tabulated logarithmic factorial

Posted by Gilles <gi...@harfang.homelinux.org>.
Hello.

On Thu, 19 Nov 2015 10:34:46 +0100, Alexey Dievsky wrote:
> Hi,
>
> I recently submitted an issue to JIRA (
> https://issues.apache.org/jira/browse/MATH-1293) and received an 
> advice to
> discuss it here.
>
> In short, the current implementation of 
> CombinatoricsUtils#factorialLog
> employs direct summation and thus has linear complexity (O(n) 
> additions and
> O(n) logs). I suggest replacing it with a tabulated lookup. This 
> solution
> will need to allocate some additional memory, but the speed-up for 
> larger n
> would be tremendous. For the values that are too large to be looked 
> up,
> equivalent Gamma#logGamma is still much faster than the existing 
> algorithm.
>
> I attached a proposed patch to the issue. However, there are a lot of
> details that need to be worked out, first and foremost whether such
> optimization is at all worth it.

According to your statement above (significantly improved performance), 
there
is actually no doubt that the feature should be offered to CM 
users.[1][2]
The main question is whether a single speed vs memory (vs possibly 
increased
initialization time) trade-off is always acceptable.

Hence at first sight, I'd prefer to provide a factory to create an
object (that will compute the factorial) with a user-defined trade-off.

Regards,
Gilles

[1] It would also be nice if you could provide a benchmark.
[2] A real-life use-case would also be nice to enhance the CM 
userguide.

> Thus I ask for your opinion. Please let me
> know what think.
>
> Sincerely,
> Aleksei Dievskii.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org