You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@commons.apache.org by Gilles Sadowski <gi...@harfang.homelinux.org> on 2011/09/14 23:02:45 UTC

Re: [Math] FastMath preset tables

Hello.

> >
> >People taking part to this discussion[1] seem to have a hard time being
> >explicit about what they are trying to achieve.
> >
> >(1)
> >>From information gathered so far, the issue raised seems to have been solved
> >by taking advantage of the fact that the JVM loads classes at first use (i.e
> >methods will not be delayed by the loading of tables that they don't use).
> >
> >At least, this is what I conclude from my tests that compare code with and
> >without preset tables (which differ by less than 50 ms). [This has yet to be
> >confirmed by the initial poster who reported a unexpected difference of 30
> >microseconds!]
> 
> According to Alexis post from today, the loading time is rather 5ms
> with one setting and 182ms with the reverse setting, so there is
> large factor (36 times).

The factor is large, but it does not matter really matter because you
multiply it by 1 (one): the gain is one-shot. That's why I think that it is
relevant to ask whether the application at hand will be restarted several
times per second.

> The initial low times are explained by his benchmark which simply
> computed class load time. With the initial situation, this was fait
> as it involved computing the tables, but with the current code it
> was not representative anymore since it did not even load the
> tables, as they are loaded on demand and on a per table basis.

That's what I thought (cf. the JIRA page).

> >
> >(2)
> >A _second_ issue has been bundled in the commits related to the initial
> >problem described above: Instead of computing the tables contents at
> >runtime, they are now set from litteral arrays.
> >
> >In addition to being non-consensual, it is still not clear that this change
> >is a necessary step to fix the reported problem.
> 
> I don't understand your point. Tables can be computed at runtime or
> at compile time. Literal arrays is simply the easiest and most
> portable way to have compile-time arrays. Other options would
> involve really tricky steps that would be difficult to get right
> with different build systems. The build systems used currently are
> ant, maven2 and eclipse at least, and these systems are also used by
> Gump and Continuum. I think other people use other IDE too.
> 
> When we dicussed this initially with Sebb, we ruled out such complex
> settings, and decided to go with literal arrays generated once.

Has this discussion taken place here, I must have missed it.
I don't see what complex settings you had considered.
I agree that if tables must be used, they should be generated once, and that
the litteral arrays are the simplest.

However, the first and main point is that I could not understand what was
the considered to be a "too large" initialization time (cf. my last comment
on the JIRA page).
Reasonable as it was to fix a minute-long startup, I did not think it
reasonable tograb for an additional tenth of a second.

> Is there another option we missed ?

[cf. below.]

> >
> >(3)
> >On a PC, comparing the old "FastMath" code (no IOD, no preset) with the
> >latest version, I get the following timing gain for a single call to
> >"pow" (i.e. a function that _uses_ the tables):
> >   130 ms (preset)
> >    80 ms (no preset)
> >So, indeed, using preset tables does make the first call run faster.
> >[On subsequent calls, the difference is less than 1 microseconds (cf.
> >"FastMathLoadCheck").]
> >
> >The issue is: When do we say that initialization time is too long?
> 
> I think I am lost here. What do you call preset and no preset ?

preset = precomputed tables (aka litteral arrays)
no preset = tables computed at runtime

> >
> >On this machine:
> >   Intel(R) Core(TM)2 Quad CPU @ 2.40GHz
> >the difference is around 50 ms. Is that too long?
> >This will most probably be swamped in the execution time of any useful
> >application and in my opinion does not justify the workaround currently in
> >trunk.
> >
> >The slowliness reported initially (9 seconds to ~1 minute on a "low-end"
> >device) is indeed excessive.
> >But can we please draw the line at some meaningful value instead of
> >prematurely over-optimizing for a one-shot gain?
> 
> I agree with you. From direct experience with the Android
> application, I experienced a loading delay slightly below one
> minute. I asked Alexis to do some benchmark and he reported most of
> the time was due to FastMath, thenk I asked him to open a Jira
> issue.
> 
> I'm not sure anymore which benchmarks are flawed and which
> benchmarks are representative. Unfortunately, due to some low level
> kernel issue, I cannot do any benchmark by myself on my tablet
> except user level timing (I can't connect my tablet to my computer
> and run it in a monitored mode). So I am waiting for a new version
> of the complete application for such user-lvel timings, and cannot
> do anything about sub-second precise timings. I am sorry for that.

I do not own neither tablet nor smartphone, and so was not able to perform
the same check as I did on my machine (with the "PerfTestTUtils" class in
the "test" area of the repository). Hence I could only wonder about what was
being timed in the report by Alexis and hos to relate to what I was
observing here. [OTOH, the results from Sebb's benchmark were perfectly
compatible.]

Can you use "PerfTestUtils" on your devices? This is like using CM, so I
don't think that any kernel issue should prevent it.
I'll post the java file I used to benchmark calls to "pow" on the JIRA page.

> >
> >(4)
> >Can we also lay out rules about what consitutes an acceptable request for a
> >workaround?
> >
> >I don't think that is OK to just say that "FastMath" is too slow. The master
> >argument here was often that one should provide a (realistic) use case.
> >
> >I see that a faster startup time would benefit an application required to
> >be restarted several times per second. But how realistic would that be?
> 
> This occurs in web services. This is a kind of application we get
> more and more often. I don't know at all how the server handles
> upcoming requests, and in particular if classes are reloaded or not,
> reoptimized or not. I know for sure the JVM is not restarted from
> scratch.

Exactly.
As Ted already pointed, it would really be impossible to run web services if
the JVM would be restarted for each request, or would even reload the
classes.
The "long" initialization of "FastMath" will be done once. And I bet that
the few hundred milliseconds we talk about are completely offset by the
initialization of the rest of the web server machinery.

> Another kind of application we have is small user computation
> (things akin to a pocket calculator). The android application we
> speak about belongs to this category. It is a space flight dynamics
> calculator that performs simple conversions (orbit conversions,
> frames conversions, time conversions, visibility detection,
> spacecraft impulse maneuvers). There are no high frequency
> repetition, but there are human factors. Typically, if you would
> have to wait more than one seconds to get the results of a
> multiplication in your pocet calculator, you would be upset.

Right.

> Here,
> we have to wait 57 seconds.

Then 100 ms plus or minus won't matter.

> The last benchmarks seem to imply
> FastMath was not the only culprit, despite what was initially
> identified. It is however part of the problem and for the web
> services case I think it is really worth improving its loading time.

As said above, I don't think that preset tables won't make any noticeable
difference for a web service. The more so if the computation is inherently
complex and the "incompressible" request time is already beyond a second.

> >And would "FastMath" be the single bottleneck in such a case?
> >Moreover, if there was such need to be able to restart the JVM several times
> >per second, then I'd draw the attention to the fact that "FastMath" is not
> >the right tool: Indeed, for the first call to "pow", it is still about 150
> >slower than "Math" or "StrictMath". Does that suggest that we must implement
> >some way so that users are able to select whether CM will use "Math" or
> >"FastMath"?
> >
> >(5)
> >On Sun, Sep 11, 2011 at 02:51:31PM +0100, sebb wrote:
> >>[...]
> >>
> >>I don't think minimising the class source file size is nearly as
> >>important as the startup time.
> >>
> >
> >First, it's not only about source size, but also code versus tables.
> >The former is self-descriptive.
> 
> Yes, but we don't expect anybody to read the whole table. Reading
> only some comments above it pointing to the code that was used to
> generate them is sufficient.

Yes, maybe; I'm just pointing out that such arguments as I present are at
least as important as a 100 ms gain at startup, gain that dwindles as time
passes and computers become faster.

> >
> >Second, not only source file is larger, but so is bytecode size.
> >Without the preset tables, the ".class" file for was 38229 bytes long.
> >With all the changes to accomodate preset tables, there are now 5 ".class"
> >files with the following sizes:
> >   8172  FastMathCalc.class
> >  34671  FastMath.class
> >  35252  FastMath$ExpFracTable.class
> >  49944  FastMath$ExpIntTable.class
> >  39328  FastMath$lnMant.class
> >
> >For the same functionality, this results in more than a four-fold increase
> >in bytecode size.[2]
> 
> Yes, so what ?
> Many performance problems end up with a trade-off between memory and
> execution time. As memory is cheap the current trend is to go to
> large tables in many places. Even inside processors, or any decent
> mathematical functions libraries, there are tables. One of the
> problems with such functions is even named the "table maker dilemma"
> (the term was coinde by Kahan if I remember well, who is well know
> for all his work on floating point arithmetic and IEEE standard).
> 
> I would gladly accept tables up to a few megabytes.

You cannot, as I've pointed out early on in this discussion. A Java source
file, and Java code constructs have several limitations on size (64K).
That's why I proposed to move those tables to separate source files with the
advantage that it won't pollute a file that is manually edited. The table
files could be generated by a script.

> For now I would
> be worried by tables larger than several tens of megabytes. However,
> I am convinced that in 3 to 5 years for now I would say otherwise
> and would start saying that megabytes tables are small and worries
> start at gigagbytes. Now we have 3 tables and the largest is 50
> kilobytes, this is small and does almost fit within many processor
> caches, which are currently of the order of magnitude of 35kbytes
> for 5 years old processors (I don't have a more recent processor to
> check newer values).

If going in that direction (not discussing whether this good or bad),
I would say that we should surely not use litteral arrays but look at those
tables as "resources" and load them with the appropriate functionality.
This would then most clearly set them apart from the "real" code.


Regards,
Gilles

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: [Math] FastMath preset tables

Posted by Ted Dunning <te...@gmail.com>.

Resources are likely to be considerably faster and more compact than class
files.  The issue is that the class files actually compile into code that
inserts values one by one.  Better to just read the whole table in a single
go directly into an array.

On Thu, Sep 15, 2011 at 9:21 AM, <lu...@free.fr> wrote:

> > If going in that direction (not discussing whether this good or bad),
> > I would say that we should surely not use litteral arrays but look at
> > those
> > tables as "resources" and load them with the appropriate
> > functionality.
> > This would then most clearly set them apart from the "real" code.
>
> If you think another way to load the tables is better, then go for it.
> I really don't care about how it is done. I just want it to be done fast.
> We can use serialized data in the embedded resources if it is fine with
> you.
>

Re: [Math] FastMath preset tables

Posted by lu...@free.fr.

Hi Gilles,

----- Mail original -----
> Hello.
> 
> > >
> > >People taking part to this discussion[1] seem to have a hard time
> > >being
> > >explicit about what they are trying to achieve.
> > >
> > >(1)
> > >>From information gathered so far, the issue raised seems to have
> > >>been solved
> > >by taking advantage of the fact that the JVM loads classes at
> > >first use (i.e
> > >methods will not be delayed by the loading of tables that they
> > >don't use).
> > >
> > >At least, this is what I conclude from my tests that compare code
> > >with and
> > >without preset tables (which differ by less than 50 ms). [This has
> > >yet to be
> > >confirmed by the initial poster who reported a unexpected
> > >difference of 30
> > >microseconds!]
> > 
> > According to Alexis post from today, the loading time is rather 5ms
> > with one setting and 182ms with the reverse setting, so there is
> > large factor (36 times).
> 
> The factor is large, but it does not matter really matter because you
> multiply it by 1 (one): the gain is one-shot. That's why I think that
> it is
> relevant to ask whether the application at hand will be restarted
> several
> times per second.

OK then.

> 
> > The initial low times are explained by his benchmark which simply
> > computed class load time. With the initial situation, this was fait
> > as it involved computing the tables, but with the current code it
> > was not representative anymore since it did not even load the
> > tables, as they are loaded on demand and on a per table basis.
> 
> That's what I thought (cf. the JIRA page).
> 
> > >
> > >(2)
> > >A _second_ issue has been bundled in the commits related to the
> > >initial
> > >problem described above: Instead of computing the tables contents
> > >at
> > >runtime, they are now set from litteral arrays.
> > >
> > >In addition to being non-consensual, it is still not clear that
> > >this change
> > >is a necessary step to fix the reported problem.
> > 
> > I don't understand your point. Tables can be computed at runtime or
> > at compile time. Literal arrays is simply the easiest and most
> > portable way to have compile-time arrays. Other options would
> > involve really tricky steps that would be difficult to get right
> > with different build systems. The build systems used currently are
> > ant, maven2 and eclipse at least, and these systems are also used
> > by
> > Gump and Continuum. I think other people use other IDE too.
> > 
> > When we dicussed this initially with Sebb, we ruled out such
> > complex
> > settings, and decided to go with literal arrays generated once.
> 
> Has this discussion taken place here, I must have missed it.

Look at the first comments in the Jira issue. My first comment considered
creating the tables at compile time and storing them so they are loaded
at run time without recomputation. The following comment from Sebb said we
should rather print out generated data and then include it. This is basically
what has been done.

> I don't see what complex settings you had considered.
> I agree that if tables must be used, they should be generated once,
> and that
> the litteral arrays are the simplest.

Well, I did not explain this in the comments as Sebb provided a better
solution immediately after my initial proposal.
For the sake of completeness, I considered having something in the
spirit of automatically generated code, something akin to what language
parsers generators (antlr) or structured data parsers generator (jibx)
or even model transformers for model driven architecture (acceleo) do.

Such generators are run in the build process before the compiler, so the compiler
can process the generated code. There is something for this in maven
I think, but you have to do it yourself in ant and you have to twick the
build system in IDE like eclipse. This is complex and really overkill. In
this case, generating this code once by one of the developers running
an application and saving the generated file as if it was hand-written
was a much smarter solution: these tables are only simple large tables.

> 
> However, the first and main point is that I could not understand what
> was
> the considered to be a "too large" initialization time (cf. my last
> comment
> on the JIRA page).
> Reasonable as it was to fix a minute-long startup, I did not think it
> reasonable tograb for an additional tenth of a second.

Yes, I agree. As we can explain less than one second, the initial problem
is something else.

> 
> > Is there another option we missed ?
> 
> [cf. below.]
> 
> > >
> > >(3)
> > >On a PC, comparing the old "FastMath" code (no IOD, no preset)
> > >with the
> > >latest version, I get the following timing gain for a single call
> > >to
> > >"pow" (i.e. a function that _uses_ the tables):
> > >   130 ms (preset)
> > >    80 ms (no preset)
> > >So, indeed, using preset tables does make the first call run
> > >faster.
> > >[On subsequent calls, the difference is less than 1 microseconds
> > >(cf.
> > >"FastMathLoadCheck").]
> > >
> > >The issue is: When do we say that initialization time is too long?
> > 
> > I think I am lost here. What do you call preset and no preset ?
> 
> preset = precomputed tables (aka litteral arrays)
> no preset = tables computed at runtime
> 
> > >
> > >On this machine:
> > >   Intel(R) Core(TM)2 Quad CPU @ 2.40GHz
> > >the difference is around 50 ms. Is that too long?
> > >This will most probably be swamped in the execution time of any
> > >useful
> > >application and in my opinion does not justify the workaround
> > >currently in
> > >trunk.
> > >
> > >The slowliness reported initially (9 seconds to ~1 minute on a
> > >"low-end"
> > >device) is indeed excessive.
> > >But can we please draw the line at some meaningful value instead
> > >of
> > >prematurely over-optimizing for a one-shot gain?
> > 
> > I agree with you. From direct experience with the Android
> > application, I experienced a loading delay slightly below one
> > minute. I asked Alexis to do some benchmark and he reported most of
> > the time was due to FastMath, thenk I asked him to open a Jira
> > issue.
> > 
> > I'm not sure anymore which benchmarks are flawed and which
> > benchmarks are representative. Unfortunately, due to some low level
> > kernel issue, I cannot do any benchmark by myself on my tablet
> > except user level timing (I can't connect my tablet to my computer
> > and run it in a monitored mode). So I am waiting for a new version
> > of the complete application for such user-lvel timings, and cannot
> > do anything about sub-second precise timings. I am sorry for that.
> 
> I do not own neither tablet nor smartphone, and so was not able to
> perform
> the same check as I did on my machine (with the "PerfTestTUtils"
> class in
> the "test" area of the repository). Hence I could only wonder about
> what was
> being timed in the report by Alexis and hos to relate to what I was
> observing here. [OTOH, the results from Sebb's benchmark were
> perfectly
> compatible.]
> 
> Can you use "PerfTestUtils" on your devices? This is like using CM,
> so I
> don't think that any kernel issue should prevent it.
> I'll post the java file I used to benchmark calls to "pow" on the
> JIRA page.

I will see how I can do this. From the underlying code run by Orekit
in this application, I would say that a realistic use case involves the
following computations (these are rough approximations): 30 sines,
30 cosines, 50 square roots, 5 exponentials, 2 natural logarithms, 10 floor.

> 
> > >
> > >(4)
> > >Can we also lay out rules about what consitutes an acceptable
> > >request for a
> > >workaround?
> > >
> > >I don't think that is OK to just say that "FastMath" is too slow.
> > >The master
> > >argument here was often that one should provide a (realistic) use
> > >case.
> > >
> > >I see that a faster startup time would benefit an application
> > >required to
> > >be restarted several times per second. But how realistic would
> > >that be?
> > 
> > This occurs in web services. This is a kind of application we get
> > more and more often. I don't know at all how the server handles
> > upcoming requests, and in particular if classes are reloaded or
> > not,
> > reoptimized or not. I know for sure the JVM is not restarted from
> > scratch.
> 
> Exactly.
> As Ted already pointed, it would really be impossible to run web
> services if
> the JVM would be restarted for each request, or would even reload the
> classes.
> The "long" initialization of "FastMath" will be done once. And I bet
> that
> the few hundred milliseconds we talk about are completely offset by
> the
> initialization of the rest of the web server machinery.

If the concerns I had about this are irrelevant, these are good news.

> 
> > Another kind of application we have is small user computation
> > (things akin to a pocket calculator). The android application we
> > speak about belongs to this category. It is a space flight dynamics
> > calculator that performs simple conversions (orbit conversions,
> > frames conversions, time conversions, visibility detection,
> > spacecraft impulse maneuvers). There are no high frequency
> > repetition, but there are human factors. Typically, if you would
> > have to wait more than one seconds to get the results of a
> > multiplication in your pocet calculator, you would be upset.
> 
> Right.
> 
> > Here,
> > we have to wait 57 seconds.
> 
> Then 100 ms plus or minus won't matter.

I don't want to keep this level of 57 seconds. So once we find
what really causes it and reduce it, we will see what remains. Of
course if we are still at a few seconds (even as small as 2 or 3
seconds) then 100ms would be neglectible.

> 
> > The last benchmarks seem to imply
> > FastMath was not the only culprit, despite what was initially
> > identified. It is however part of the problem and for the web
> > services case I think it is really worth improving its loading
> > time.
> 
> As said above, I don't think that preset tables won't make any
> noticeable
> difference for a web service. The more so if the computation is
> inherently
> complex and the "incompressible" request time is already beyond a
> second.
> 
> > >And would "FastMath" be the single bottleneck in such a case?
> > >Moreover, if there was such need to be able to restart the JVM
> > >several times
> > >per second, then I'd draw the attention to the fact that
> > >"FastMath" is not
> > >the right tool: Indeed, for the first call to "pow", it is still
> > >about 150
> > >slower than "Math" or "StrictMath". Does that suggest that we must
> > >implement
> > >some way so that users are able to select whether CM will use
> > >"Math" or
> > >"FastMath"?
> > >
> > >(5)
> > >On Sun, Sep 11, 2011 at 02:51:31PM +0100, sebb wrote:
> > >>[...]
> > >>
> > >>I don't think minimising the class source file size is nearly as
> > >>important as the startup time.
> > >>
> > >
> > >First, it's not only about source size, but also code versus
> > >tables.
> > >The former is self-descriptive.
> > 
> > Yes, but we don't expect anybody to read the whole table. Reading
> > only some comments above it pointing to the code that was used to
> > generate them is sufficient.
> 
> Yes, maybe; I'm just pointing out that such arguments as I present
> are at
> least as important as a 100 ms gain at startup, gain that dwindles as
> time
> passes and computers become faster.
> 
> > >
> > >Second, not only source file is larger, but so is bytecode size.
> > >Without the preset tables, the ".class" file for was 38229 bytes
> > >long.
> > >With all the changes to accomodate preset tables, there are now 5
> > >".class"
> > >files with the following sizes:
> > >   8172  FastMathCalc.class
> > >  34671  FastMath.class
> > >  35252  FastMath$ExpFracTable.class
> > >  49944  FastMath$ExpIntTable.class
> > >  39328  FastMath$lnMant.class
> > >
> > >For the same functionality, this results in more than a four-fold
> > >increase
> > >in bytecode size.[2]
> > 
> > Yes, so what ?
> > Many performance problems end up with a trade-off between memory
> > and
> > execution time. As memory is cheap the current trend is to go to
> > large tables in many places. Even inside processors, or any decent
> > mathematical functions libraries, there are tables. One of the
> > problems with such functions is even named the "table maker
> > dilemma"
> > (the term was coinde by Kahan if I remember well, who is well know
> > for all his work on floating point arithmetic and IEEE standard).
> > 
> > I would gladly accept tables up to a few megabytes.
> 
> You cannot, as I've pointed out early on in this discussion. A Java
> source
> file, and Java code constructs have several limitations on size
> (64K).
> That's why I proposed to move those tables to separate source files
> with the
> advantage that it won't pollute a file that is manually edited. The
> table
> files could be generated by a script.
> 
> > For now I would
> > be worried by tables larger than several tens of megabytes.
> > However,
> > I am convinced that in 3 to 5 years for now I would say otherwise
> > and would start saying that megabytes tables are small and worries
> > start at gigagbytes. Now we have 3 tables and the largest is 50
> > kilobytes, this is small and does almost fit within many processor
> > caches, which are currently of the order of magnitude of 35kbytes
> > for 5 years old processors (I don't have a more recent processor to
> > check newer values).
> 
> If going in that direction (not discussing whether this good or bad),
> I would say that we should surely not use litteral arrays but look at
> those
> tables as "resources" and load them with the appropriate
> functionality.
> This would then most clearly set them apart from the "real" code.

If you think another way to load the tables is better, then go for it.
I really don't care about how it is done. I just want it to be done fast.
We can use serialized data in the embedded resources if it is fine with you.

best regards,
Luc

> 
> 
> Regards,
> Gilles
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org