You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@commons.apache.org by Sébastien Brisard <se...@m4x.org> on 2012/08/30 05:22:16 UTC

[math] Binary or text resource files?

Hi,
testing of special functions involves comparing actual values returned
by CM with expected values as computed with an arbitrary precision
software (I use Maxima [1] for this purpose). As I intend these tests
to be assesments of the overall accuracy of our implementations, the
number of test values is quite large. For the time being, I've inlined
the reference values in double[][] arrays, in the test classes. This
clutters the code, and I will move these reference values to resource
files.
In order to limit the size of these files, I'm considering binary
files, the obvious drawback being the lack of readability (for those
of us who haven't entered the Matrix yet).
So what I would propose to add a readme.txt file in the same resource
file directory, where the content of each binary file would be
detailed.
Would you object to that?
I'm thinking of reserving the *.dat extension to these binary files.
This would entail renaming a few resource files from *.dat (I had
myself introduced in the optimization.general package) to *.txt. Is
that OK?

Thanks for your advice,
Sébastien


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [math] Binary or text resource files?

Posted by Gilles Sadowski <gi...@harfang.homelinux.org>.
Hi Sébastien.

> right, until we reach a consensus, I've inlined the reference data as
> double[][], trying to keep those arrays to a reasonable size.
> I really would like to provide publicly available extensive validation
> of all special functions we have (and will) implement. It seems to me
> very important. BOOST does that, which I think lays a standard we
> ought to live up to.

That's certainly a valuable effort. Thanks.

> I'm just not sure what the right way to do this
> is. Surely, using unit tests for this is a bit far-fetched. Gilles was
> talking about a side-project: did you mean a whole new project in
> SANDBOX?

No, I don't think so. I just meant that such reports could form an annex to
the user guide.
As there are already directories like "src/test/java", "src/test/resources",
"src/test/R", there could be something like
  src/test/reports
  src/test/reports/data
  src/test/reports/templates
  src/test/reports/bin
where programs in "bin" would generate reports by reading their input from
"data" and format them according to the "templates" (for example).

> That project would then not be a library, but rather a set of
> executables which produce nicely formatted reports on the accuracy of
> o.a.c.m.special, to be included in the User's guide of CM? Is that
> what you had in mind, Gilles?

Indeed.

There could also be reports about performance (micro-benchmarking) and
actual use-cases.


Best,
Gilles

> What do others think?
> 
> Sébastien
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [math] Binary or text resource files?

Posted by Sébastien Brisard <se...@m4x.org>.
Hello,
right, until we reach a consensus, I've inlined the reference data as
double[][], trying to keep those arrays to a reasonable size.
I really would like to provide publicly available extensive validation
of all special functions we have (and will) implement. It seems to me
very important. BOOST does that, which I think lays a standard we
ought to live up to. I'm just not sure what the right way to do this
is. Surely, using unit tests for this is a bit far-fetched. Gilles was
talking about a side-project: did you mean a whole new project in
SANDBOX? That project would then not be a library, but rather a set of
executables which produce nicely formatted reports on the accuracy of
o.a.c.m.special, to be included in the User's guide of CM? Is that
what you had in mind, Gilles? What do others think?

Sébastien


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [math] Binary or text resource files?

Posted by Phil Steitz <ph...@gmail.com>.
On 8/29/12 8:22 PM, Sébastien Brisard wrote:
> Hi,
> testing of special functions involves comparing actual values returned
> by CM with expected values as computed with an arbitrary precision
> software (I use Maxima [1] for this purpose). As I intend these tests
> to be assesments of the overall accuracy of our implementations, the
> number of test values is quite large. For the time being, I've inlined
> the reference values in double[][] arrays, in the test classes. This
> clutters the code, and I will move these reference values to resource
> files.
> In order to limit the size of these files, I'm considering binary
> files, the obvious drawback being the lack of readability (for those
> of us who haven't entered the Matrix yet).
> So what I would propose to add a readme.txt file in the same resource
> file directory, where the content of each binary file would be
> detailed.
> Would you object to that?
> I'm thinking of reserving the *.dat extension to these binary files.
> This would entail renaming a few resource files from *.dat (I had
> myself introduced in the optimization.general package) to *.txt. Is
> that OK?

Unless the files are really huge, I think its better to stick with
text files, but as long as you include the readme and keep it up to
date, I would be OK with this.  We have a lot of test files in
src/test/resources for various purposes now.  I like the ones in
/stat because they include documentation about where they came from
/ what they mean in the files themselves. 

Phil
>
> Thanks for your advice,
> Sébastien
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [math] Binary or text resource files?

Posted by Ted Dunning <te...@gmail.com>.
I, in contrast, am quite fond of this technique if there is a clear
correlation between the test and the resource and if the resource is scoped
to testing.

On Thu, Aug 30, 2012 at 5:44 AM, Gilles Sadowski <
gilles@harfang.homelinux.org> wrote:

> > This
> > clutters the code, and I will move these reference values to resource
> > files.
>
> I'm not fond of this idea.
> I prefer unit test classes to be self-contained as much as possible.
>

Re: [math] Binary or text resource files?

Posted by Sébastien Brisard <se...@m4x.org>.
Hi Gilles,
thanks for your answer.

2012/8/30 Gilles Sadowski <gi...@harfang.homelinux.org>:
> Hello.
>
>> testing of special functions involves comparing actual values returned
>> by CM with expected values as computed with an arbitrary precision
>> software (I use Maxima [1] for this purpose). As I intend these tests
>> to be assesments of the overall accuracy of our implementations, the
>> number of test values is quite large.
>
> How large?
>
>> For the time being, I've inlined
>> the reference values in double[][] arrays, in the test classes.
>
> A priori, that's fine.
>
>> This
>> clutters the code, and I will move these reference values to resource
>> files.
>
> I'm not fond of this idea.
> I prefer unit test classes to be self-contained as much as possible.
>
>> In order to limit the size of these files, I'm considering binary
>> files, the obvious drawback being the lack of readability (for those
>> of us who haven't entered the Matrix yet).
>> So what I would propose to add a readme.txt file in the same resource
>> file directory, where the content of each binary file would be
>> detailed.
>> Would you object to that?
>
> Why do you want to test a very large number of values? Isn't it enough to
> select problematic cases (near boundaries, very small values, very large
> values, etc.).
>
That would make sense if we follow the path you sketch below.
To be the devil's advocate: there is one minor objection, though:
depending on the implementation, problematic values are not always the
same. But that is tractable (we incrementally add problematic values
as new implementations are written, never removing any of the values
previously considered as problematic).

> I'm not sure that unit tests should aim at testing all values exhaustively.
> That might be a side project, maybe to be included in the user guide (?).
>
I like this idea very much. In fact I would like to be able to provide
the user with a full report on accuracy, for specific ranges of the
argument (if relevant). Doing so in the user guide would give us the
opportunity to include graphs, which might help. The only objection I
would have is the following: surefire reports are generated
automatically, so that the impact on accuracy of any change in the
implementation gets reported immediately.

Maybe we could have a side project which gets automatically run as
part of the build cycle, and which produces human readable reports. I
like practical experience in this field, so II would welcome any
suggestion.

Sébastien


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [math] Binary or text resource files?

Posted by Sébastien Brisard <se...@m4x.org>.
2012/8/30 Gilles Sadowski <gi...@harfang.homelinux.org>:
> Hello.
>
>> testing of special functions involves comparing actual values returned
>> by CM with expected values as computed with an arbitrary precision
>> software (I use Maxima [1] for this purpose). As I intend these tests
>> to be assesments of the overall accuracy of our implementations, the
>> number of test values is quite large.
>
> How large?
>
For example, testing of GammaDistribution involves some *.csv files,
the largest of which is 300kb. This might not be so huge in the end
(although the svn commit log is quite large!).

BTW, you might notice that I've committed this morning a new *.csv
file. I'm not taking decisions without waiting for everyone's ascent,
I'm merely extending some tests which were already using resource
files. If we decide for another option, I'll change all of this
together.

Best regards,
Sébastien


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [math] Binary or text resource files?

Posted by Gilles Sadowski <gi...@harfang.homelinux.org>.
Hello.

> testing of special functions involves comparing actual values returned
> by CM with expected values as computed with an arbitrary precision
> software (I use Maxima [1] for this purpose). As I intend these tests
> to be assesments of the overall accuracy of our implementations, the
> number of test values is quite large.

How large?

> For the time being, I've inlined
> the reference values in double[][] arrays, in the test classes.

A priori, that's fine.

> This
> clutters the code, and I will move these reference values to resource
> files.

I'm not fond of this idea.
I prefer unit test classes to be self-contained as much as possible.

> In order to limit the size of these files, I'm considering binary
> files, the obvious drawback being the lack of readability (for those
> of us who haven't entered the Matrix yet).
> So what I would propose to add a readme.txt file in the same resource
> file directory, where the content of each binary file would be
> detailed.
> Would you object to that?

Why do you want to test a very large number of values? Isn't it enough to
select problematic cases (near boundaries, very small values, very large
values, etc.).

I'm not sure that unit tests should aim at testing all values exhaustively.
That might be a side project, maybe to be included in the user guide (?).

> I'm thinking of reserving the *.dat extension to these binary files.
> This would entail renaming a few resource files from *.dat (I had
> myself introduced in the optimization.general package) to *.txt. Is
> that OK?

Let's first decide about the break-up...


Regards,
Gilles

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org