You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@commons.apache.org by lu...@apache.org on 2008/04/23 23:37:09 UTC

svn commit: r651074 - in /commons/sandbox/nabla/trunk/src/site/xdoc: index.xml internals.xml usage.xml

Author: luc
Date: Wed Apr 23 14:37:08 2008
New Revision: 651074

URL: http://svn.apache.org/viewvc?rev=651074&view=rev
Log:
improved documentation
the developers-oriented documentation has been started

Modified:
    commons/sandbox/nabla/trunk/src/site/xdoc/index.xml
    commons/sandbox/nabla/trunk/src/site/xdoc/internals.xml
    commons/sandbox/nabla/trunk/src/site/xdoc/usage.xml

Modified: commons/sandbox/nabla/trunk/src/site/xdoc/index.xml
URL: http://svn.apache.org/viewvc/commons/sandbox/nabla/trunk/src/site/xdoc/index.xml?rev=651074&r1=651073&r2=651074&view=diff
==============================================================================
--- commons/sandbox/nabla/trunk/src/site/xdoc/index.xml (original)
+++ commons/sandbox/nabla/trunk/src/site/xdoc/index.xml Wed Apr 23 14:37:08 2008
@@ -24,7 +24,7 @@
 
   <body>
 
-    <section name="Introduction" href="introduction">
+    <section name="Introduction">
       <p>
         Nabla is an automatic differentiator for mathematical functions.
         Just like the mathematical Nabla operator transforms a function
@@ -72,7 +72,7 @@
 
     </section>
 
-    <section name="Example" href="example">
+    <section name="Example">
 
       <p>
         The following example should explain better what Nabla can do
@@ -157,7 +157,7 @@
 
     </section>
 
-    <section name="How ?" href="how">
+    <section name="How ?">
 
       <p>
         The previous example shows that Nabla creates an object that

Modified: commons/sandbox/nabla/trunk/src/site/xdoc/internals.xml
URL: http://svn.apache.org/viewvc/commons/sandbox/nabla/trunk/src/site/xdoc/internals.xml?rev=651074&r1=651073&r2=651074&view=diff
==============================================================================
--- commons/sandbox/nabla/trunk/src/site/xdoc/internals.xml (original)
+++ commons/sandbox/nabla/trunk/src/site/xdoc/internals.xml Wed Apr 23 14:37:08 2008
@@ -23,31 +23,162 @@
   </properties>
 
   <body>
-    <section name="Principles" href="principles">
-        <subsection name="Symbolic differentiation" href="symbolic-differentiation">
-        </subsection>
-        <subsection name="Validity" href="validity">
-        </subsection>
-        <subsection name="Virtual machine execution model" href="jvm">
-            <subsubsection name="Frame" href="frame">
-            </subsubsection>
-            <subsubsection name="Bytecode instructions" href="instructions">
-            </subsubsection>
-        </subsection>
+    <section name="Principles">
+
+      <subsection name="Symbolic differentiation">
+        <p>
+          Nabla computes the derivatives using symbolic differentiation. Considering
+          a typical call to a univariate function:
+          <pre><code>double r = f(t);</code></pre>
+          Nabla tracks the data flow that leads from the <code>t</code> parameter to the
+          <code>r</code> result to understand how this result is computed. All the
+          operations leading from <code>t</code> to <code>r</code> belong to a small set
+          of instructions corresponding to the capabilities of the virtual machine. This
+          set contains basic arithmetic operations (addition, subtraction ...), conversion
+          operations (double to int, long to double ...), storage instructions (local
+          variables, functions parameters, instance or class fields ...) and calls to
+          elementary functions defined in the <code>Math</code> and <code>StrictMath</code>
+          classes. There is really nothing more!
+        </p>
+        <p>
+          For each one of these basic computer instruction, we know how to map it to a
+          mathematical equation and we can combine this equation with its derivative to
+          form a pair of equations we will use later. For example, a <code>DADD</code>
+          bytecode instruction corresponds to the addition of two real numbers and
+          produces a third number which is their sum. So we map the instruction to the
+          equation:
+          <pre><code>c=a+b</code></pre>
+          and we combine this with its derivative to form the pair:
+          <pre><code>(c=a+b, c'=a'+b')</code></pre>
+          In this example, we have simply used the linearity property of differentiation
+          which implies that the derivative of a sum is the sum of the derivatives.
+          Similar rules exist for all arithmetic instructions, and the derivative of all
+          basic functions in the <code>Math</code> and <code>StrictMath</code> is known.
+          The complete rules set is described in the <a
+          href="#Differentiation rules">Differentiation rules</a> section below.
+        </p>
+        <p>
+          So the original computation path from <code>t</code> to <code>r</code> can be
+          expanded by conceptually replacing all single equations that constitute the
+          code by pairs of equations. The first element of each pair is a simple copy
+          of the original equation. The original computation path was fed by a
+          single double value (the function parameter <code>t</code>), but the expanded
+          computation path needs a pair of values, one for each element in the equations
+          pair. The first element of the pair of input values will be the value of the
+          parameter, and the second will be the <em>derivative of the parameter with respect
+          to the free variable</em>. This means that in our case, if we want to compute the
+          derivative with respect to <code>t</code>, the second element will be the
+          derivative of <code>t</code> with respect to itself, which is simply the constant 1.
+          In this case, we need to feed the computation path with the pair <code>(t, 1)</code>.
+        </p>
+        <p>
+          Without changing anything to the analysis or to the expansion of the computation path,
+          we can also handle the case where <code>t</code> is not an independent variable but is
+          itself a function of another free variable, i.e. a case where:
+          <pre><code>t = g(x)</code>, with <code>dt/dx = g'(x)</code></pre>
+          In this case, we would feed the computation path with the pair <code>(g(x), g'(x))</code>
+          instead of the pair <code>(t, 1)</code>. This allows to handle functions composition,
+          including resursive calls.
+        </p>
+      </subsection>
+
+      <subsection name="Validity">
+        <p>
+          What is the validity of this approach?
+        </p>
+        <p>
+          For straightforward smooth functions, the expanded code really computes both the
+          value of the equation and its exact derivative. This is a simple application of
+          the differentiation rules. So the accuracy of the derivative will be in par with
+          the accuracy of the initial function. If the initial function is a good model
+          of a physical process, the derivative will be a good evaluation of its evolution.
+          If the initial function is only an approximation, the derivative will be an
+          approximation too, but <em>an approximation that is consistent with the initial
+          function up to computer accuracy</em>.
+        </p>
+        <p>
+          As soon as the initial function is not smooth, then some design choices are
+          involved which have an impact on validity. These choices have been made in
+          such a way that in some sense, the result is still as valid, as accurate and as
+          consistent with the initial function as for smooth functions.
+        </p>
+        <p>
+          First of all, what are non-smooth functions in our model, which is based on the
+          operations and functions available in the Java virtual machine? These are either
+          functions that involve calls to non-smooth functions of the <code>Math</code> and
+          <code>StrictMath</code> classes near their singularity points (for example
+          <code>Math.abs</code>, <code>Math.sqrt</code> or <code>Math.log</code> near zero),
+          or functions for which the computation path includes conditional branches
+          involving computed double parameters (for example convergence loops or <code>if</code>,
+          <code>then</code>, <code>else</code> statements). This does <em>neither</em> include
+          functions that use only unconditional jumps <em>nor</em> loops with a number of iterations
+          not related to a computed double parameter. Such code could theoretically be
+          reorganized and loops unrolled to produce a (perhaps huge) straightforward smooth
+          computation path.
+        </p>
+        <p>
+          For singularities corresponding to domain definition boundaries (like
+          <code>Math.sqrt</code> and <code>Math.log</code> which cannot be computed for
+          negative parameters), the theoretical derivative is defined only on the side of the
+          singularity where the function itself is defined. The value of this half-derivative
+          is the limit value of the derivative when approaching the singularity. Since for these
+          functions we use the expression of the derivative that is valid where the function
+          is valid, our computation is consistent with the theoretical mathematical definition.
+          For example in both the <code>Math.sqrt</code> and <code>Math.log</code>, the
+          derivative is infinite at zero, with the proper sign according to the sign of the
+          input derivative.
+        </p>
+        <p>
+          For singularities not related to domain definition boundaries (like
+          <code>Math.abs</code> and conditional branches), the theoretical derivative is not
+          defined as a single value, but as a pair of left and a right half-derivatives, one for
+          each side of the singularity. Since there is little support in the IEEE754 standard
+          to distinguish the left and right hand side of a single value (except for zero, since
+          -0 and +0 both exist), we have decided to adopt a simplified approach. These cases are
+          implemented by simple conditional branches (we added explicitly such a conditional in the
+          <code>Math.abs</code> case). Nabla then simply computes the value of the smooth
+          derivative on the branch of the computation path that is selected at run time, depending
+          on the values of the input parameters. This choice allows to preserve the property of
+          having a derivative that is always consistent with the associated value, and it is a simple
+          arbitrary choice of one of the two possibilities that correspond to the mathematical result,
+          which by itself does not choose between them.
+        </p>
+      </subsection>
+
+      <subsection name="Virtual machine execution model">
+        <subsubsection name="Frame">
+        </subsubsection>
+
+        <subsubsection name="Bytecode instructions">
+        </subsubsection>
+      </subsection>
+
     </section>
-    <section name="Implementation" href="implementation">
-        <subsection name="Differential pairs" href="differential-pairs">
-        </subsection>
-        <subsection name="Bytecode transforms" href="bytecode-transforms">
-        </subsection>
-        <subsection name="Complete differentiation example" href="example">
-        </subsection>
+
+    <section name="Implementation">
+
+      <subsection name="Differential pairs">
+      </subsection>
+
+      <subsection name="Bytecode transforms">
+      </subsection>
+
+      <subsection name="Differentiation rules">
+      </subsection>
+
+      <subsection name="Complete differentiation example">
+      </subsection>
+
     </section>
-    <section name="Issues" href="issues">
-        <subsection name="Singularities handling" href="singularities">
-        </subsection>
-        <subsection name="Data flow and control flow analysis" href="flows-analysis">
-        </subsection>
+
+    <section name="Issues">
+
+      <subsection name="Singularities handling">
+      </subsection>
+
+      <subsection name="Data flow and control flow analysis">
+      </subsection>
+
     </section>
   </body>
 </document>

Modified: commons/sandbox/nabla/trunk/src/site/xdoc/usage.xml
URL: http://svn.apache.org/viewvc/commons/sandbox/nabla/trunk/src/site/xdoc/usage.xml?rev=651074&r1=651073&r2=651074&view=diff
==============================================================================
--- commons/sandbox/nabla/trunk/src/site/xdoc/usage.xml (original)
+++ commons/sandbox/nabla/trunk/src/site/xdoc/usage.xml Wed Apr 23 14:37:08 2008
@@ -23,7 +23,7 @@
   </properties>
 
   <body>
-    <section name="Public API" href="api">
+    <section name="Public API">
       <p>
         Nabla public interface is very small, it is composed of only three
         interfaces and two classes.
@@ -85,7 +85,7 @@
 
     </section>
 
-    <section name="Updating the base and derived objects" href="updating">
+    <section name="Updating the base and derived objects">
 
       <p>
         One important thing to note is a consequence of the fact that the

Re: svn commit: r651074 - in /commons/sandbox/nabla/trunk/src/site/xdoc: index.xml internals.xml usage.xml

Posted by Phil Steitz <ph...@gmail.com>.

On Fri, Apr 25, 2008 at 2:21 PM, Luc Maisonobe <Lu...@free.fr> wrote:
> Phil Steitz a écrit :
>
>
>
> > On Wed, Apr 23, 2008 at 2:37 PM,  <lu...@apache.org> wrote:
> >
> > > Author: luc
> > >  Date: Wed Apr 23 14:37:08 2008
> > >  New Revision: 651074
> > >
> > >  URL: http://svn.apache.org/viewvc?rev=651074&view=rev
> > >  Log:
> > >  improved documentation
> > >  the developers-oriented documentation has been started
> > >
> >
> > Thanks, Luc!
> > <snip/>
> >
> >
> >
> > >  +        <p>
> > >  +          For singularities not related to domain definition
> boundaries (like
> > >  +          <code>Math.abs</code> and conditional branches), the
> theoretical derivative is not
> > >  +          defined as a single value, but as a pair of left and a right
> half-derivatives, one for
> > >  +          each side of the singularity. Since there is little support
> in the IEEE754 standard
> > >  +          to distinguish the left and right hand side of a single
> value (except for zero, since
> > >  +          -0 and +0 both exist), we have decided to adopt a simplified
> approach. These cases are
> > >  +          implemented by simple conditional branches (we added
> explicitly such a conditional in the
> > >  +          <code>Math.abs</code> case). Nabla then simply computes the
> value of the smooth
> > >  +          derivative on the branch of the computation path that is
> selected at run time, depending
> > >  +          on the values of the input parameters. This choice allows to
> preserve the property of
> > >  +          having a derivative that is always consistent with the
> associated value, and it is a simple
> > >  +          arbitrary choice of one of the two possibilities that
> correspond to the mathematical result,
> > >  +          which by itself does not choose between them.
> > >  +        </p>
> > >
> >
> > The problem here is that it is not an "arbitrary choice" between the
> > two different values - the limit that is the derivative does not
> > exist.  It would make more sense to me to return NaN or throw IAE in
> > these cases.  Is that tractable?  Moreover, is it tractable to
> > consistently define differentiability and throw an appopriate
> > exception or return NaN at points where a java-defined function is not
> > differentiable?
> >
>
>  I understand your concerns. I don't think however it would be feasible to
> detect these cases and process them specifically, be it by returning NaN or
> throwing an exception.
>
>  First, we would have to add branches to the flow of control, to add an
> equality test like this:
>
>   if (x < 0) f(x) else g(x)
>   would become
>   if (x < 0) {f(x),f'(x)} else if (x == 0) {f(0),NaN} else {g(x),g'(x)}
>
>  This would really be hard. It would also not work since it would break for
> the following code, when differentiating either with respect to x or y:
>
>   double r = Math.sqrt(x * x + y * y)
>   if (x < 0) {
>     return 2 * Math.atan(y / (r + x));
>   else if (y < 0) {
>     return -Math.PI - 2 * Math.atan(y / (r - x));
>   } else {
>     return Math.PI - 2 * Math.atan(y / (r - x));
>   }
>
>  This code is in fact a poor man implementation of Math.atan2(y, x) for x
> and y not simultaneously null. Despite it has two different branches for
> positive and negative values of x, the function and all its derivatives are
> continuous across this test. There is a small overlap where both expressions
> yield to the same result, the branches are only here to avoid singularities
> far from x = 0 (singularities at x = +/-y).
>
>  In this case, we would introduce a special handling and a NAN or exception
> that would really be wrong. The same sort of things would occur for example
> in tabulated functions where algorithms take care to preserve smoothness at
> sampling points despite control flow branches are split at these points.
>
>
>
> >
> > We should at least document the behavior in the javadoc in any case.
> >
>
>  Yes, we sould document it in Javadoc but also in user documentation and
> developers documentation. I need to rewrite these sections.
>
>  Do you agree with this ?

To be honest at this point I am not sure I understand fully the
consequences of making choices at points where functions are not
differentiable.  It could be that for practical purposes, this is not
a big deal, but it could also be that in some cases bad things could
happen because of funny behavior around cut points.  You are probably
in a better position than me to judge on this.

Documentation will be a little messy.  Seems we will have to write up
something like a recursive definition of the derived derivative
function including specification of what happens at singularities of
the JDK functions and conditionally defined functions.  Does that make
sense?

Unfortunately, even given a complete definition, users will have to
decompile functions to trace derivative computations and in theory
these could be different under different JDK versions.  Is that true?
Is it material?

Phil

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: svn commit: r651074 - in /commons/sandbox/nabla/trunk/src/site/xdoc: index.xml internals.xml usage.xml

Posted by Luc Maisonobe <Lu...@free.fr>.

Phil Steitz a écrit :
> On Wed, Apr 23, 2008 at 2:37 PM,  <lu...@apache.org> wrote:
>> Author: luc
>>  Date: Wed Apr 23 14:37:08 2008
>>  New Revision: 651074
>>
>>  URL: http://svn.apache.org/viewvc?rev=651074&view=rev
>>  Log:
>>  improved documentation
>>  the developers-oriented documentation has been started
> 
> Thanks, Luc!
> <snip/>
> 
> 
>>  +        <p>
>>  +          For singularities not related to domain definition boundaries (like
>>  +          <code>Math.abs</code> and conditional branches), the theoretical derivative is not
>>  +          defined as a single value, but as a pair of left and a right half-derivatives, one for
>>  +          each side of the singularity. Since there is little support in the IEEE754 standard
>>  +          to distinguish the left and right hand side of a single value (except for zero, since
>>  +          -0 and +0 both exist), we have decided to adopt a simplified approach. These cases are
>>  +          implemented by simple conditional branches (we added explicitly such a conditional in the
>>  +          <code>Math.abs</code> case). Nabla then simply computes the value of the smooth
>>  +          derivative on the branch of the computation path that is selected at run time, depending
>>  +          on the values of the input parameters. This choice allows to preserve the property of
>>  +          having a derivative that is always consistent with the associated value, and it is a simple
>>  +          arbitrary choice of one of the two possibilities that correspond to the mathematical result,
>>  +          which by itself does not choose between them.
>>  +        </p>
> 
> The problem here is that it is not an "arbitrary choice" between the
> two different values - the limit that is the derivative does not
> exist.  It would make more sense to me to return NaN or throw IAE in
> these cases.  Is that tractable?  Moreover, is it tractable to
> consistently define differentiability and throw an appopriate
> exception or return NaN at points where a java-defined function is not
> differentiable?

I understand your concerns. I don't think however it would be feasible 
to detect these cases and process them specifically, be it by returning 
NaN or throwing an exception.

First, we would have to add branches to the flow of control, to add an 
equality test like this:

   if (x < 0) f(x) else g(x)
   would become
   if (x < 0) {f(x),f'(x)} else if (x == 0) {f(0),NaN} else {g(x),g'(x)}

This would really be hard. It would also not work since it would break 
for the following code, when differentiating either with respect to x or y:

   double r = Math.sqrt(x * x + y * y)
   if (x < 0) {
     return 2 * Math.atan(y / (r + x));
   else if (y < 0) {
     return -Math.PI - 2 * Math.atan(y / (r - x));
   } else {
     return Math.PI - 2 * Math.atan(y / (r - x));
   }

This code is in fact a poor man implementation of Math.atan2(y, x) for x 
and y not simultaneously null. Despite it has two different branches for 
positive and negative values of x, the function and all its derivatives 
are continuous across this test. There is a small overlap where both 
expressions yield to the same result, the branches are only here to 
avoid singularities far from x = 0 (singularities at x = +/-y).

In this case, we would introduce a special handling and a NAN or 
exception that would really be wrong. The same sort of things would 
occur for example in tabulated functions where algorithms take care to 
preserve smoothness at sampling points despite control flow branches are 
split at these points.

> 
> We should at least document the behavior in the javadoc in any case.

Yes, we sould document it in Javadoc but also in user documentation and 
developers documentation. I need to rewrite these sections.

Do you agree with this ?

Luc

> 
> Phil
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: svn commit: r651074 - in /commons/sandbox/nabla/trunk/src/site/xdoc: index.xml internals.xml usage.xml

Posted by Phil Steitz <ph...@gmail.com>.

On Wed, Apr 23, 2008 at 2:37 PM,  <lu...@apache.org> wrote:
> Author: luc
>  Date: Wed Apr 23 14:37:08 2008
>  New Revision: 651074
>
>  URL: http://svn.apache.org/viewvc?rev=651074&view=rev
>  Log:
>  improved documentation
>  the developers-oriented documentation has been started

Thanks, Luc!
<snip/>
>


>  +        <p>
>  +          For singularities not related to domain definition boundaries (like
>  +          <code>Math.abs</code> and conditional branches), the theoretical derivative is not
>  +          defined as a single value, but as a pair of left and a right half-derivatives, one for
>  +          each side of the singularity. Since there is little support in the IEEE754 standard
>  +          to distinguish the left and right hand side of a single value (except for zero, since
>  +          -0 and +0 both exist), we have decided to adopt a simplified approach. These cases are
>  +          implemented by simple conditional branches (we added explicitly such a conditional in the
>  +          <code>Math.abs</code> case). Nabla then simply computes the value of the smooth
>  +          derivative on the branch of the computation path that is selected at run time, depending
>  +          on the values of the input parameters. This choice allows to preserve the property of
>  +          having a derivative that is always consistent with the associated value, and it is a simple
>  +          arbitrary choice of one of the two possibilities that correspond to the mathematical result,
>  +          which by itself does not choose between them.
>  +        </p>

The problem here is that it is not an "arbitrary choice" between the
two different values - the limit that is the derivative does not
exist.  It would make more sense to me to return NaN or throw IAE in
these cases.  Is that tractable?  Moreover, is it tractable to
consistently define differentiability and throw an appopriate
exception or return NaN at points where a java-defined function is not
differentiable?

We should at least document the behavior in the javadoc in any case.

Phil

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org