You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@daffodil.apache.org by Roger L Costello <co...@mitre.org> on 2021/08/19 21:58:02 UTC

How to run performance tests?

Hi Folks,

I want to measure the time it takes Daffodil to read in the input file, parse it, and write the XML to a file. How do I do that from the command line?

Something like this, I imagine:

     	daffodil.bat -performance ???

/Roger

Re: How to run performance tests?

Posted by Steve Lawrence <sl...@apache.org>.
An additional clarification, the times in the performance command do not
include reading from disk, but do include reading from the in-memory
storage.

When testing parse performance, the times includes both parsing and
outputting the infoset to XML. Different infoset types will have
different overheads, so if you're curious about just the time to parse
and not about the infoset output, you can use the "-I null" option (same
as with the "parse" subcommand). For example

  daffodil performance -s foo.dfdl.xsd -N 10 -I null foo.dat

When testing unparse performance, the time includes both reading the
infoset from memory and writing the unparsed data. Note that although we
"write" all the unparsed data, we don't actually store it in memory or
write it to a file, it's essentially writing to /dev/null, so the
overhead of writing data is minimized, since that can be very hardware
dependent.

On 8/20/21 7:22 AM, Steve Lawrence wrote:
> If you want to see how long it takes to parse a single file, you can add
> the -v flag to the normal parse/unparse command and it will output time
> to compile the schema and time to parse. For example:
> 
>   $ daffodil -v parse -s foo.dfdl.xsd foo.dat
>   [info] Time (compiling): 1631ms
>   <?xml version="1.0" encoding="UTF-8"?>
>   <foo>bar<foo>
>   [info] Time (parsing): 59ms
> 
> Note that the "Time (parsing)" value includes reading the file since
> daffodil reads it in as a stream, so a slow hard drive could affect this
> number. Also note that Java take some time to just-in-time compile and
> optimize the Java bytecode, so a single parse is not going to be
> representative of the fasted possible speed, compared to once Java is
> warmed up.
> 
> For these reasons, we've added the "performance" subcommand [1], which
> tries to mitigate some of these issues, and can be run like this:
> 
>   daffodil performance -s foo.dfdl.xsd -N 100 foo.dat
> 
> This will read foo.dat into memory to avoid any overhead cause by
> reading from a harddrive. Then it will parse that data 100 times (or
> whatever you set -N to) record the time for each individual parse, and
> then output some stats, something like this:
> 
>   total parse time (sec): 0.159370
>   min rate (files/sec): 13.985179
>   max rate (files/sec): 167.186282
>   avg rate (files/sec): 62.746911
> 
> Notice how there is a big difference between the min rate (the slowest
> individual parse) and the max rate (the fastest individual parse). This
> is because of the just-in-time compilation and optimization that Java
> does during the first number of parses. To get an accurate number of the
> fastest daffodil can parse (hardware dependent of course), then I
> usually bump up the -N option until max rate stops increasing. This
> allows Java to finish all the compilation/optimizattion.
> 
> You can also add "--unparse" to the performance command to test
> unparsing (the data needs to be an XML file). And you can also use the
> "--threads" option to increase the number of threads if your interested
> how threading improves things. Also the "-v" option mentioned at the
> time will also show all the individual times if your interested in that.
> 
> - Steve
> 
> [1] https://daffodil.apache.org/cli/#performance-subcommand
> 
> On 8/19/21 5:58 PM, Roger L Costello wrote:
>> Hi Folks,
>>
>> I want to measure the time it takes Daffodil to read in the input file, parse it, and write the XML to a file. How do I do that from the command line?
>>
>> Something like this, I imagine:
>>
>>      	daffodil.bat -performance ???
>>
>> /Roger
>>
> 


Re: How to run performance tests?

Posted by Steve Lawrence <sl...@apache.org>.
If you want to see how long it takes to parse a single file, you can add
the -v flag to the normal parse/unparse command and it will output time
to compile the schema and time to parse. For example:

  $ daffodil -v parse -s foo.dfdl.xsd foo.dat
  [info] Time (compiling): 1631ms
  <?xml version="1.0" encoding="UTF-8"?>
  <foo>bar<foo>
  [info] Time (parsing): 59ms

Note that the "Time (parsing)" value includes reading the file since
daffodil reads it in as a stream, so a slow hard drive could affect this
number. Also note that Java take some time to just-in-time compile and
optimize the Java bytecode, so a single parse is not going to be
representative of the fasted possible speed, compared to once Java is
warmed up.

For these reasons, we've added the "performance" subcommand [1], which
tries to mitigate some of these issues, and can be run like this:

  daffodil performance -s foo.dfdl.xsd -N 100 foo.dat

This will read foo.dat into memory to avoid any overhead cause by
reading from a harddrive. Then it will parse that data 100 times (or
whatever you set -N to) record the time for each individual parse, and
then output some stats, something like this:

  total parse time (sec): 0.159370
  min rate (files/sec): 13.985179
  max rate (files/sec): 167.186282
  avg rate (files/sec): 62.746911

Notice how there is a big difference between the min rate (the slowest
individual parse) and the max rate (the fastest individual parse). This
is because of the just-in-time compilation and optimization that Java
does during the first number of parses. To get an accurate number of the
fastest daffodil can parse (hardware dependent of course), then I
usually bump up the -N option until max rate stops increasing. This
allows Java to finish all the compilation/optimizattion.

You can also add "--unparse" to the performance command to test
unparsing (the data needs to be an XML file). And you can also use the
"--threads" option to increase the number of threads if your interested
how threading improves things. Also the "-v" option mentioned at the
time will also show all the individual times if your interested in that.

- Steve

[1] https://daffodil.apache.org/cli/#performance-subcommand

On 8/19/21 5:58 PM, Roger L Costello wrote:
> Hi Folks,
> 
> I want to measure the time it takes Daffodil to read in the input file, parse it, and write the XML to a file. How do I do that from the command line?
> 
> Something like this, I imagine:
> 
>      	daffodil.bat -performance ???
> 
> /Roger
>