You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@daffodil.apache.org by "Rose, Rob P" <Ro...@gd-ms.com> on 2019/11/08 15:45:27 UTC

CLI Performance usage...

All,

                I am trying to port the Apache daffodil libraries onto an cross domain guard that runs in a very small form factor.

                We have cross compiled OpenJDK 12 for the aarch64 (ARM processor) and loaded into memory.
                I have built the source using sbt (sbt daffodil-cli/stage) and loaded the necessary jars into memory on the board.

                Here are some of the specifics of the hardware platform running on this guard:

*         2 GB DDR RAM

o   Memory Management Unit (MMU) Page Tables used in this system are one-to-one mapping.

*         ARM Cortex A53 4 Core Processor

Here are some the specifics for the software components

*         SELinux

*         Busybox

Here is some of the performance numbers we are seeing from the performance testing:

                NOTE:  These tests were run using the attached csv file and the attached schema


# ./daffodil performance -s demo/csv.dfdl.xsd -N 100 -t 5 demo/test_file.csv
total parse time (sec): 2.443824

*         What does the total parse time value mean ?

*         How is it calculated ?

*         Is this poor performance?
min rate (files/sec): 1.535568

*         What is the min rate (files/sec)  What does this mean ?
max rate (files/sec): 29.460340

*         What is the max rate (files/sec)  What does this mean ?
avg rate (files/sec): 40.919485

*         What is the avg rate (files/sec)  What does this mean ?


*         Do you have any suggestions how to improve parse/unparsed speed on an ARM processor?



*         Any suggestions are greatly appreciated!



# ./daffodil performance -s demo/csv.dfdl.xsd -N 200 -t 5 demo/test_file.csv
total parse time (sec): 3.175893
min rate (files/sec): 1.520884
max rate (files/sec): 107.223428
avg rate (files/sec): 62.974409

# ./daffodil performance -s demo/csv.dfdl.xsd -N 300 -t 5 demo/test_file.csv
total parse time (sec): 3.656587
min rate (files/sec): 1.551273
max rate (files/sec): 180.155186
avg rate (files/sec): 82.043712


# ./daffodil performance -s demo/csv.dfdl.xsd -N 1000 -t 5 demo/test_file.csv
total parse time (sec): 5.602554
min rate (files/sec): 1.459977
max rate (files/sec): 301.144046
avg rate (files/sec): 178.490026



Sincerely,

Rob Rose
Sr. Principal Software Engineer
General Dynamics Mission Systems
Office: 508-880-1866
Cell:      508-341-5216

This message and/or attachments may include information subject to GD Corporate Policies 07-103 and 07-105 and is intended to be accessed only by authorized recipients.  Use, storage and transmission are governed by General Dynamics and its policies. Contractual restrictions apply to third parties.  Recipients should refer to the policies or contract to determine proper handling.  Unauthorized review, use, disclosure or distribution is prohibited.  If you are not an intended recipient, please contact the sender and destroy all copies of the original message.

Re: CLI Performance usage...

Posted by Rob Rose <rp...@gmail.com>.

Steve,
     Thank you so much for the information!  It is a huge help in understanding the results!

      I am going to perform similar tests using the JAVA API.
Rob
On 2019/11/08 16:36:46, Steve Lawrence <sl...@apache.org> wrote: 
> Hi Rob,
> 
> I don't think you are subscribed to the dev list. I'd recommend you
> subscribe so you don't miss any responses if someone forgets to reply all.
> 
> Before answering your questions, I'll reiterate that the -N option says
> how many times to repeat the parse, the -t option says how many threads
> to use. So the performance command will parse the test_file.csv file N
> times. If t is not 1, it will parallelize those N parses across t threads.
> 
> Total parse time is just the wall clock time from the time the first
> parse starts to the time the last parse finishes. So in your example it
> took about 2.4 seconds to parse test_file.csv 100 times with 5 threads.
> 
> The min rate is determined by finding the parse that took the longest of
> those N parses and calculating how fast you could parse if it always
> took that long. This is just 1 / longest_parse_time.
> 
> The max rate is the same, but uses the shortest parse time.
> 
> The average rate is could really be thought of as throughput. This is
> calculated by taking the total number_of_parses / wall_clock_time. This
> can essentially be thought of as throughput. Usually
> increasing/decreasing the number of threads will slow down the max rate
> but increase the average rate, since more threads generally means more
> throughput (until it doesn't and makes things worse ;),
> 
> Note that min and max rates are often very different (usually orders of
> magnitude) because the first bunch of parses take a while for the Java
> do JIT compiling and optimizations. So the max rate is what you are more
> likely to see once in production once the JVM is warmed up and a bunch
> of parses have completed.
> 
> And you can see this in your results. Parsing 200 files gave you a max
> rate of 107 files/second, but parsing 1000 gave you a max of 300. At
> some point, as you increase the number of files the max rate will stop
> getting better, which essentially means the JVM is fully warm and
> optimized, and that's the fastest rate you'll get.
> 
> As to wheather you numbers are good or not, they don't seem particularly
> good. On my laptop, with the csv schema and a small csv file, at about
> -N 100,000 files and a single thread I get to around 12000 files/sec, so
> many times faster than yours.
> 
> Unfortunately, I don't have any suggestions specific to ARM. I will say
> that Daffodil can be pretty memory hungry, so usually giving more memory
> to the JVM helps performance. But the Daffodil CLI defaults to 1024MB,
> so you might not be able to bump it much more with your limited RAM.
> 
> The other suggestion I have is to decrease the number of threads and see
> how things improve. Some libraries we use in Daffodil are not thread
> safe, so we have to use things like ThreadLocals which likely incurs
> some overhead and slows things down.
> 
> - Steve
> 
> 
> 
> 
> On 11/8/19 10:45 AM, Rose, Rob P wrote:
> > All,
> > 
> >                  I am trying to port the Apache daffodil libraries onto an cross 
> > domain guard that runs in a very small form factor.
> > 
> >                  We have cross compiled OpenJDK 12 for the aarch64 (ARM 
> > processor) and loaded into memory.
> > 
> >                  I have built the source using sbt (sbt daffodil-cli/stage) and 
> > loaded the necessary jars into memory on the board.
> > 
> >                  Here are some of the specifics of the hardware platform running 
> > on this guard:
> > 
> > ·2 GB DDR RAM
> > 
> > oMemory Management Unit (MMU) Page Tables used in this system are one-to-one 
> > mapping.
> > 
> > ·ARM Cortex A53 4 Core Processor
> > 
> > Here are some the specifics for the software components
> > 
> > ·SELinux
> > 
> > ·Busybox
> > 
> > Here is some of the performance numbers we are seeing from the performance testing:
> > 
> > *NOTE:  These tests were run using the attached csv file and the attached schema*
> > 
> > **
> > 
> > **
> > 
> > # ./daffodil performance -s demo/csv.dfdl.xsd -N 100 -t 5 demo/test_file.csv
> > 
> > total parse time (sec): 2.443824
> > 
> > ·What does the total parse time value mean ?
> > 
> > ·How is it calculated ?
> > 
> > ·Is this poor performance?
> > 
> > min rate (files/sec): 1.535568
> > 
> > ·What is the min rate (files/sec)  What does this mean ?
> > 
> > max rate (files/sec): 29.460340
> > 
> > ·What is the max rate (files/sec)  What does this mean ?
> > 
> > avg rate (files/sec): 40.919485
> > 
> > ·What is the avg rate (files/sec)  What does this mean ?
> > 
> > ·Do you have any suggestions how to improve parse/unparsed speed on an ARM 
> > processor?
> > 
> > ·Any suggestions are greatly appreciated!
> > 
> > # ./daffodil performance -s demo/csv.dfdl.xsd -N 200 -t 5 demo/test_file.csv
> > 
> > total parse time (sec): 3.175893
> > 
> > min rate (files/sec): 1.520884
> > 
> > max rate (files/sec): 107.223428
> > 
> > avg rate (files/sec): 62.974409
> > 
> > # ./daffodil performance -s demo/csv.dfdl.xsd -N 300 -t 5 demo/test_file.csv
> > 
> > total parse time (sec): 3.656587
> > 
> > min rate (files/sec): 1.551273
> > 
> > max rate (files/sec): 180.155186
> > 
> > avg rate (files/sec): 82.043712
> > 
> > # ./daffodil performance -s demo/csv.dfdl.xsd -N 1000 -t 5 demo/test_file.csv
> > 
> > total parse time (sec): 5.602554
> > 
> > min rate (files/sec): 1.459977
> > 
> > max rate (files/sec): 301.144046
> > 
> > avg rate (files/sec): 178.490026
> > 
> > Sincerely,
> > 
> > Rob Rose
> > 
> > Sr. Principal Software Engineer
> > 
> > General Dynamics Mission Systems
> > 
> > Office: 508-880-1866
> > 
> > Cell:      508-341-5216
> > 
> > /This message and/or attachments may include information subject to GD Corporate 
> > Policies 07-103 and 07-105 and is intended to be accessed only by authorized 
> > recipients.  Use, storage and transmission are governed by General Dynamics and 
> > its policies. Contractual restrictions apply to third parties.  Recipients 
> > should refer to the policies or contract to determine proper handling.  
> > Unauthorized review, use, disclosure or distribution is prohibited.  If you are 
> > not an intended recipient, please contact the sender and destroy all copies of 
> > the original message./
> > 
> 
>

Re: CLI Performance usage...

Posted by Rob Rose <rp...@gmail.com>.

Steve,


On 2019/11/08 16:36:46, Steve Lawrence <sl...@apache.org> wrote: 
> Hi Rob,
> 
> I don't think you are subscribed to the dev list. I'd recommend you
> subscribe so you don't miss any responses if someone forgets to reply all.
> 
> Before answering your questions, I'll reiterate that the -N option says
> how many times to repeat the parse, the -t option says how many threads
> to use. So the performance command will parse the test_file.csv file N
> times. If t is not 1, it will parallelize those N parses across t threads.
> 
> Total parse time is just the wall clock time from the time the first
> parse starts to the time the last parse finishes. So in your example it
> took about 2.4 seconds to parse test_file.csv 100 times with 5 threads.
> 
> The min rate is determined by finding the parse that took the longest of
> those N parses and calculating how fast you could parse if it always
> took that long. This is just 1 / longest_parse_time.
> 
> The max rate is the same, but uses the shortest parse time.
> 
> The average rate is could really be thought of as throughput. This is
> calculated by taking the total number_of_parses / wall_clock_time. This
> can essentially be thought of as throughput. Usually
> increasing/decreasing the number of threads will slow down the max rate
> but increase the average rate, since more threads generally means more
> throughput (until it doesn't and makes things worse ;),
> 
> Note that min and max rates are often very different (usually orders of
> magnitude) because the first bunch of parses take a while for the Java
> do JIT compiling and optimizations. So the max rate is what you are more
> likely to see once in production once the JVM is warmed up and a bunch
> of parses have completed.
> 
> And you can see this in your results. Parsing 200 files gave you a max
> rate of 107 files/second, but parsing 1000 gave you a max of 300. At
> some point, as you increase the number of files the max rate will stop
> getting better, which essentially means the JVM is fully warm and
> optimized, and that's the fastest rate you'll get.
> 
> As to wheather you numbers are good or not, they don't seem particularly
> good. On my laptop, with the csv schema and a small csv file, at about
> -N 100,000 files and a single thread I get to around 12000 files/sec, so
> many times faster than yours.
> 
> Unfortunately, I don't have any suggestions specific to ARM. I will say
> that Daffodil can be pretty memory hungry, so usually giving more memory
> to the JVM helps performance. But the Daffodil CLI defaults to 1024MB,
> so you might not be able to bump it much more with your limited RAM.
> 
> The other suggestion I have is to decrease the number of threads and see
> how things improve. Some libraries we use in Daffodil are not thread
> safe, so we have to use things like ThreadLocals which likely incurs
> some overhead and slows things down.
> 
> - Steve
> 
> 
> 
> 
> On 11/8/19 10:45 AM, Rose, Rob P wrote:
> > All,
> > 
> >                  I am trying to port the Apache daffodil libraries onto an cross 
> > domain guard that runs in a very small form factor.
> > 
> >                  We have cross compiled OpenJDK 12 for the aarch64 (ARM 
> > processor) and loaded into memory.
> > 
> >                  I have built the source using sbt (sbt daffodil-cli/stage) and 
> > loaded the necessary jars into memory on the board.
> > 
> >                  Here are some of the specifics of the hardware platform running 
> > on this guard:
> > 
> > ·2 GB DDR RAM
> > 
> > oMemory Management Unit (MMU) Page Tables used in this system are one-to-one 
> > mapping.
> > 
> > ·ARM Cortex A53 4 Core Processor
> > 
> > Here are some the specifics for the software components
> > 
> > ·SELinux
> > 
> > ·Busybox
> > 
> > Here is some of the performance numbers we are seeing from the performance testing:
> > 
> > *NOTE:  These tests were run using the attached csv file and the attached schema*
> > 
> > **
> > 
> > **
> > 
> > # ./daffodil performance -s demo/csv.dfdl.xsd -N 100 -t 5 demo/test_file.csv
> > 
> > total parse time (sec): 2.443824
> > 
> > ·What does the total parse time value mean ?
> > 
> > ·How is it calculated ?
> > 
> > ·Is this poor performance?
> > 
> > min rate (files/sec): 1.535568
> > 
> > ·What is the min rate (files/sec)  What does this mean ?
> > 
> > max rate (files/sec): 29.460340
> > 
> > ·What is the max rate (files/sec)  What does this mean ?
> > 
> > avg rate (files/sec): 40.919485
> > 
> > ·What is the avg rate (files/sec)  What does this mean ?
> > 
> > ·Do you have any suggestions how to improve parse/unparsed speed on an ARM 
> > processor?
> > 
> > ·Any suggestions are greatly appreciated!
> > 
> > # ./daffodil performance -s demo/csv.dfdl.xsd -N 200 -t 5 demo/test_file.csv
> > 
> > total parse time (sec): 3.175893
> > 
> > min rate (files/sec): 1.520884
> > 
> > max rate (files/sec): 107.223428
> > 
> > avg rate (files/sec): 62.974409
> > 
> > # ./daffodil performance -s demo/csv.dfdl.xsd -N 300 -t 5 demo/test_file.csv
> > 
> > total parse time (sec): 3.656587
> > 
> > min rate (files/sec): 1.551273
> > 
> > max rate (files/sec): 180.155186
> > 
> > avg rate (files/sec): 82.043712
> > 
> > # ./daffodil performance -s demo/csv.dfdl.xsd -N 1000 -t 5 demo/test_file.csv
> > 
> > total parse time (sec): 5.602554
> > 
> > min rate (files/sec): 1.459977
> > 
> > max rate (files/sec): 301.144046
> > 
> > avg rate (files/sec): 178.490026
> > 
> > Sincerely,
> > 
> > Rob Rose
> > 
> > Sr. Principal Software Engineer
> > 
> > General Dynamics Mission Systems
> > 
> > Office: 508-880-1866
> > 
> > Cell:      508-341-5216
> > 
> > /This message and/or attachments may include information subject to GD Corporate 
> > Policies 07-103 and 07-105 and is intended to be accessed only by authorized 
> > recipients.  Use, storage and transmission are governed by General Dynamics and 
> > its policies. Contractual restrictions apply to third parties.  Recipients 
> > should refer to the policies or contract to determine proper handling.  
> > Unauthorized review, use, disclosure or distribution is prohibited.  If you are 
> > not an intended recipient, please contact the sender and destroy all copies of 
> > the original message./
> > 
> 
>

Re: CLI Performance usage...

Posted by Steve Lawrence <sl...@apache.org>.

Hi Rob,

I don't think you are subscribed to the dev list. I'd recommend you
subscribe so you don't miss any responses if someone forgets to reply all.

Before answering your questions, I'll reiterate that the -N option says
how many times to repeat the parse, the -t option says how many threads
to use. So the performance command will parse the test_file.csv file N
times. If t is not 1, it will parallelize those N parses across t threads.

Total parse time is just the wall clock time from the time the first
parse starts to the time the last parse finishes. So in your example it
took about 2.4 seconds to parse test_file.csv 100 times with 5 threads.

The min rate is determined by finding the parse that took the longest of
those N parses and calculating how fast you could parse if it always
took that long. This is just 1 / longest_parse_time.

The max rate is the same, but uses the shortest parse time.

The average rate is could really be thought of as throughput. This is
calculated by taking the total number_of_parses / wall_clock_time. This
can essentially be thought of as throughput. Usually
increasing/decreasing the number of threads will slow down the max rate
but increase the average rate, since more threads generally means more
throughput (until it doesn't and makes things worse ;),

Note that min and max rates are often very different (usually orders of
magnitude) because the first bunch of parses take a while for the Java
do JIT compiling and optimizations. So the max rate is what you are more
likely to see once in production once the JVM is warmed up and a bunch
of parses have completed.

And you can see this in your results. Parsing 200 files gave you a max
rate of 107 files/second, but parsing 1000 gave you a max of 300. At
some point, as you increase the number of files the max rate will stop
getting better, which essentially means the JVM is fully warm and
optimized, and that's the fastest rate you'll get.

As to wheather you numbers are good or not, they don't seem particularly
good. On my laptop, with the csv schema and a small csv file, at about
-N 100,000 files and a single thread I get to around 12000 files/sec, so
many times faster than yours.

Unfortunately, I don't have any suggestions specific to ARM. I will say
that Daffodil can be pretty memory hungry, so usually giving more memory
to the JVM helps performance. But the Daffodil CLI defaults to 1024MB,
so you might not be able to bump it much more with your limited RAM.

The other suggestion I have is to decrease the number of threads and see
how things improve. Some libraries we use in Daffodil are not thread
safe, so we have to use things like ThreadLocals which likely incurs
some overhead and slows things down.

- Steve

On 11/8/19 10:45 AM, Rose, Rob P wrote:
> All,
> 
>                  I am trying to port the Apache daffodil libraries onto an cross 
> domain guard that runs in a very small form factor.
> 
>                  We have cross compiled OpenJDK 12 for the aarch64 (ARM 
> processor) and loaded into memory.
> 
>                  I have built the source using sbt (sbt daffodil-cli/stage) and 
> loaded the necessary jars into memory on the board.
> 
>                  Here are some of the specifics of the hardware platform running 
> on this guard:
> 
> ·2 GB DDR RAM
> 
> oMemory Management Unit (MMU) Page Tables used in this system are one-to-one 
> mapping.
> 
> ·ARM Cortex A53 4 Core Processor
> 
> Here are some the specifics for the software components
> 
> ·SELinux
> 
> ·Busybox
> 
> Here is some of the performance numbers we are seeing from the performance testing:
> 
> *NOTE:  These tests were run using the attached csv file and the attached schema*
> 
> **
> 
> **
> 
> # ./daffodil performance -s demo/csv.dfdl.xsd -N 100 -t 5 demo/test_file.csv
> 
> total parse time (sec): 2.443824
> 
> ·What does the total parse time value mean ?
> 
> ·How is it calculated ?
> 
> ·Is this poor performance?
> 
> min rate (files/sec): 1.535568
> 
> ·What is the min rate (files/sec)  What does this mean ?
> 
> max rate (files/sec): 29.460340
> 
> ·What is the max rate (files/sec)  What does this mean ?
> 
> avg rate (files/sec): 40.919485
> 
> ·What is the avg rate (files/sec)  What does this mean ?
> 
> ·Do you have any suggestions how to improve parse/unparsed speed on an ARM 
> processor?
> 
> ·Any suggestions are greatly appreciated!
> 
> # ./daffodil performance -s demo/csv.dfdl.xsd -N 200 -t 5 demo/test_file.csv
> 
> total parse time (sec): 3.175893
> 
> min rate (files/sec): 1.520884
> 
> max rate (files/sec): 107.223428
> 
> avg rate (files/sec): 62.974409
> 
> # ./daffodil performance -s demo/csv.dfdl.xsd -N 300 -t 5 demo/test_file.csv
> 
> total parse time (sec): 3.656587
> 
> min rate (files/sec): 1.551273
> 
> max rate (files/sec): 180.155186
> 
> avg rate (files/sec): 82.043712
> 
> # ./daffodil performance -s demo/csv.dfdl.xsd -N 1000 -t 5 demo/test_file.csv
> 
> total parse time (sec): 5.602554
> 
> min rate (files/sec): 1.459977
> 
> max rate (files/sec): 301.144046
> 
> avg rate (files/sec): 178.490026
> 
> Sincerely,
> 
> Rob Rose
> 
> Sr. Principal Software Engineer
> 
> General Dynamics Mission Systems
> 
> Office: 508-880-1866
> 
> Cell:      508-341-5216
> 
> /This message and/or attachments may include information subject to GD Corporate 
> Policies 07-103 and 07-105 and is intended to be accessed only by authorized 
> recipients.  Use, storage and transmission are governed by General Dynamics and 
> its policies. Contractual restrictions apply to third parties.  Recipients 
> should refer to the policies or contract to determine proper handling.  
> Unauthorized review, use, disclosure or distribution is prohibited.  If you are 
> not an intended recipient, please contact the sender and destroy all copies of 
> the original message./
>

Re: CLI Performance usage...

Posted by "Sloane, Brandon" <bs...@tresys.com>.

> By stream mode, do you mean using the JAVA API to not the CLI implementation?  Meaning using Input/Output streams instead of command line interface?

The Daffodil CLI has a "--stream" flag you can use with parse and unparse.

When used in parse mode, Daffodil will first parse the input as normal. When it reaches the end of the schema, if there is still more data in the input, Daffodil will start a new parse from the point in the input-stream where the previous part left off. The infosets will be output as they are computed with a NUL character separating them.

For example, you could do:

> cat infile1 infile2 | daffodil parse --stream -s schema.dfdl.xsd

which will output 2 infosets. Depending on what program is feeding the pipe, you do not need to have all your inputs available when you do this (Daffodil will block on stdin if it reaches the end before the writer closed the pipe).

For a format such as CSV, this can be a bit tricky to do, as you generally detect the end of document only by the fact that it is at the end of input. If you want to take this approach, you would probably need to create a wrapper format where you, for example, prefix the length of the document. Then you would update your schema to first parse the length, then treat the entire CSG file as a fixed length format. You can make the length field a hidden group so that consumers of the infoset do not need to be updated.

> The "parse time" time vaue, is that measured as the amount of time it takes to compile the parser, parse the data according to the schema, and output the data to the console or file?

Steve probably knows this better then I do, and he thinks it is just the time to parse the data. I would caution that, because of how Daffodil is designed, it is possible that some of the work for compilation is actually deferred until parse time. Pre-compiling the parser forces Daffodil to fully compile it before starting the parse, which may be why we have seen pre-compiled parsers score better.
________________________________
From: Rose, Rob P <Ro...@gd-ms.com>
Sent: Friday, November 8, 2019 12:41 PM
To: dev@daffodil.apache.org <de...@daffodil.apache.org>
Cc: Hanna, Maria <Ma...@gd-ms.com>
Subject: RE: CLI Performance usage...

Brandon,

        Thank you so much for the useful information!  It is a huge help!

        I have a follow up question:
        You mention " I would suggest either using daffodil in stream mode, or using it as a library as part of a long-lived process "
                          By stream mode, do you mean using the JAVA API to not the CLI implementation?  Meaning using Input/Output streams instead of command line interface?


        Second question:
                The "parse time" time vaue, is that measured as the amount of time it takes to compile the parser, parse the data according to the schema, and output the data to the console or file?

Thanks so much again!
Rob


-----Original Message-----
From: Sloane, Brandon <bs...@tresys.com>
Sent: Friday, November 8, 2019 11:35 AM
To: dev@daffodil.apache.org
Cc: Hanna, Maria <Ma...@gd-ms.com>
Subject: Re: CLI Performance usage...

I am not familiar with how daffodil's performance stats are reported (particularly how the average rate is faster then the max rate).

However, the biggest bottlenecks for Daffodil performance is schema compilation. If performance is a concern, I would recommend pre-compiling your parser using the `daffodil save-parser` command. You can then use the pre-compiled parser using the '-P' flag instead of '-s'. Note that Daffodil does not have a stable format for pre-compiled parsers, so the Daffodil version used to save the parser would need to match the version used to run it.

A similar issue (which wouldn't be captured by daffodil performance) is startup time. Since Daffodil runs on the JVM, just starting it takes a substantial amount of time (`time daffodil --help` is about 800ms on my development system). On your actual system, I would suggest either using daffodil in stream mode, or using it as a library as part of a long-lived process. If you do either of these, them pre-compiling would help reduce your startup time, but would not offer any additional benefits to throughput.
________________________________
From: Rose, Rob P <Ro...@gd-ms.com>
Sent: Friday, November 8, 2019 10:45 AM
To: dev@daffodil.apache.org <de...@daffodil.apache.org>
Cc: Hanna, Maria <Ma...@gd-ms.com>
Subject: CLI Performance usage...


All,



                I am trying to port the Apache daffodil libraries onto an cross domain guard that runs in a very small form factor.



                We have cross compiled OpenJDK 12 for the aarch64 (ARM processor) and loaded into memory.

                I have built the source using sbt (sbt daffodil-cli/stage) and loaded the necessary jars into memory on the board.



                Here are some of the specifics of the hardware platform running on this guard:

*         2 GB DDR RAM

o   Memory Management Unit (MMU) Page Tables used in this system are one-to-one mapping.

*         ARM Cortex A53 4 Core Processor



Here are some the specifics for the software components

*         SELinux

*         Busybox



Here is some of the performance numbers we are seeing from the performance testing:



                NOTE:  These tests were run using the attached csv file and the attached schema





# ./daffodil performance -s demo/csv.dfdl.xsd -N 100 -t 5 demo/test_file.csv

total parse time (sec): 2.443824

*         What does the total parse time value mean ?

*         How is it calculated ?

*         Is this poor performance?

min rate (files/sec): 1.535568

*         What is the min rate (files/sec)  What does this mean ?

max rate (files/sec): 29.460340

*         What is the max rate (files/sec)  What does this mean ?

avg rate (files/sec): 40.919485

*         What is the avg rate (files/sec)  What does this mean ?



*         Do you have any suggestions how to improve parse/unparsed speed on an ARM processor?



*         Any suggestions are greatly appreciated!







# ./daffodil performance -s demo/csv.dfdl.xsd -N 200 -t 5 demo/test_file.csv

total parse time (sec): 3.175893

min rate (files/sec): 1.520884

max rate (files/sec): 107.223428

avg rate (files/sec): 62.974409



# ./daffodil performance -s demo/csv.dfdl.xsd -N 300 -t 5 demo/test_file.csv

total parse time (sec): 3.656587

min rate (files/sec): 1.551273

max rate (files/sec): 180.155186

avg rate (files/sec): 82.043712





# ./daffodil performance -s demo/csv.dfdl.xsd -N 1000 -t 5 demo/test_file.csv

total parse time (sec): 5.602554

min rate (files/sec): 1.459977

max rate (files/sec): 301.144046

avg rate (files/sec): 178.490026







Sincerely,



Rob Rose

Sr. Principal Software Engineer

General Dynamics Mission Systems

Office: 508-880-1866

Cell:      508-341-5216



This message and/or attachments may include information subject to GD Corporate Policies 07-103 and 07-105 and is intended to be accessed only by authorized recipients.  Use, storage and transmission are governed by General Dynamics and its policies. Contractual restrictions apply to third parties.  Recipients should refer to the policies or contract to determine proper handling.  Unauthorized review, use, disclosure or distribution is prohibited.  If you are not an intended recipient, please contact the sender and destroy all copies of the original message.

RE: CLI Performance usage...

Posted by "Rose, Rob P" <Ro...@gd-ms.com>.

Brandon,

	Thank you so much for the useful information!  It is a huge help!

	I have a follow up question:
	You mention " I would suggest either using daffodil in stream mode, or using it as a library as part of a long-lived process "
                          By stream mode, do you mean using the JAVA API to not the CLI implementation?  Meaning using Input/Output streams instead of command line interface?


	Second question:
		The "parse time" time vaue, is that measured as the amount of time it takes to compile the parser, parse the data according to the schema, and output the data to the console or file?

Thanks so much again!
Rob


-----Original Message-----
From: Sloane, Brandon <bs...@tresys.com> 
Sent: Friday, November 8, 2019 11:35 AM
To: dev@daffodil.apache.org
Cc: Hanna, Maria <Ma...@gd-ms.com>
Subject: Re: CLI Performance usage...

I am not familiar with how daffodil's performance stats are reported (particularly how the average rate is faster then the max rate).

However, the biggest bottlenecks for Daffodil performance is schema compilation. If performance is a concern, I would recommend pre-compiling your parser using the `daffodil save-parser` command. You can then use the pre-compiled parser using the '-P' flag instead of '-s'. Note that Daffodil does not have a stable format for pre-compiled parsers, so the Daffodil version used to save the parser would need to match the version used to run it.

A similar issue (which wouldn't be captured by daffodil performance) is startup time. Since Daffodil runs on the JVM, just starting it takes a substantial amount of time (`time daffodil --help` is about 800ms on my development system). On your actual system, I would suggest either using daffodil in stream mode, or using it as a library as part of a long-lived process. If you do either of these, them pre-compiling would help reduce your startup time, but would not offer any additional benefits to throughput.
________________________________
From: Rose, Rob P <Ro...@gd-ms.com>
Sent: Friday, November 8, 2019 10:45 AM
To: dev@daffodil.apache.org <de...@daffodil.apache.org>
Cc: Hanna, Maria <Ma...@gd-ms.com>
Subject: CLI Performance usage...


All,



                I am trying to port the Apache daffodil libraries onto an cross domain guard that runs in a very small form factor.



                We have cross compiled OpenJDK 12 for the aarch64 (ARM processor) and loaded into memory.

                I have built the source using sbt (sbt daffodil-cli/stage) and loaded the necessary jars into memory on the board.



                Here are some of the specifics of the hardware platform running on this guard:

*         2 GB DDR RAM

o   Memory Management Unit (MMU) Page Tables used in this system are one-to-one mapping.

*         ARM Cortex A53 4 Core Processor



Here are some the specifics for the software components

*         SELinux

*         Busybox



Here is some of the performance numbers we are seeing from the performance testing:



                NOTE:  These tests were run using the attached csv file and the attached schema





# ./daffodil performance -s demo/csv.dfdl.xsd -N 100 -t 5 demo/test_file.csv

total parse time (sec): 2.443824

*         What does the total parse time value mean ?

*         How is it calculated ?

*         Is this poor performance?

min rate (files/sec): 1.535568

*         What is the min rate (files/sec)  What does this mean ?

max rate (files/sec): 29.460340

*         What is the max rate (files/sec)  What does this mean ?

avg rate (files/sec): 40.919485

*         What is the avg rate (files/sec)  What does this mean ?



*         Do you have any suggestions how to improve parse/unparsed speed on an ARM processor?



*         Any suggestions are greatly appreciated!







# ./daffodil performance -s demo/csv.dfdl.xsd -N 200 -t 5 demo/test_file.csv

total parse time (sec): 3.175893

min rate (files/sec): 1.520884

max rate (files/sec): 107.223428

avg rate (files/sec): 62.974409



# ./daffodil performance -s demo/csv.dfdl.xsd -N 300 -t 5 demo/test_file.csv

total parse time (sec): 3.656587

min rate (files/sec): 1.551273

max rate (files/sec): 180.155186

avg rate (files/sec): 82.043712





# ./daffodil performance -s demo/csv.dfdl.xsd -N 1000 -t 5 demo/test_file.csv

total parse time (sec): 5.602554

min rate (files/sec): 1.459977

max rate (files/sec): 301.144046

avg rate (files/sec): 178.490026







Sincerely,



Rob Rose

Sr. Principal Software Engineer

General Dynamics Mission Systems

Office: 508-880-1866

Cell:      508-341-5216



This message and/or attachments may include information subject to GD Corporate Policies 07-103 and 07-105 and is intended to be accessed only by authorized recipients.  Use, storage and transmission are governed by General Dynamics and its policies. Contractual restrictions apply to third parties.  Recipients should refer to the policies or contract to determine proper handling.  Unauthorized review, use, disclosure or distribution is prohibited.  If you are not an intended recipient, please contact the sender and destroy all copies of the original message.

Re: CLI Performance usage...

Posted by Steve Lawrence <st...@gmail.com>.

I'll point out that the times in the performance command do not include
JVM startup time or schema compilation time. Those numbers are purely
the parse time.

Though, we have seen some cases where using a pre-compiled schema can
make a difference so it's definitely worth a shot.

- Steve

On 11/8/19 11:35 AM, Sloane, Brandon wrote:
> I am not familiar with how daffodil's performance stats are reported (particularly how the average rate is faster then the max rate).
> 
> However, the biggest bottlenecks for Daffodil performance is schema compilation. If performance is a concern, I would recommend pre-compiling your parser using the `daffodil save-parser` command. You can then use the pre-compiled parser using the '-P' flag instead of '-s'. Note that Daffodil does not have a stable format for pre-compiled parsers, so the Daffodil version used to save the parser would need to match the version used to run it.
> 
> A similar issue (which wouldn't be captured by daffodil performance) is startup time. Since Daffodil runs on the JVM, just starting it takes a substantial amount of time (`time daffodil --help` is about 800ms on my development system). On your actual system, I would suggest either using daffodil in stream mode, or using it as a library as part of a long-lived process. If you do either of these, them pre-compiling would help reduce your startup time, but would not offer any additional benefits to throughput.
> ________________________________
> From: Rose, Rob P <Ro...@gd-ms.com>
> Sent: Friday, November 8, 2019 10:45 AM
> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
> Cc: Hanna, Maria <Ma...@gd-ms.com>
> Subject: CLI Performance usage...
> 
> 
> All,
> 
> 
> 
>                 I am trying to port the Apache daffodil libraries onto an cross domain guard that runs in a very small form factor.
> 
> 
> 
>                 We have cross compiled OpenJDK 12 for the aarch64 (ARM processor) and loaded into memory.
> 
>                 I have built the source using sbt (sbt daffodil-cli/stage) and loaded the necessary jars into memory on the board.
> 
> 
> 
>                 Here are some of the specifics of the hardware platform running on this guard:
> 
> ·         2 GB DDR RAM
> 
> o   Memory Management Unit (MMU) Page Tables used in this system are one-to-one mapping.
> 
> ·         ARM Cortex A53 4 Core Processor
> 
> 
> 
> Here are some the specifics for the software components
> 
> ·         SELinux
> 
> ·         Busybox
> 
> 
> 
> Here is some of the performance numbers we are seeing from the performance testing:
> 
> 
> 
>                 NOTE:  These tests were run using the attached csv file and the attached schema
> 
> 
> 
> 
> 
> # ./daffodil performance -s demo/csv.dfdl.xsd -N 100 -t 5 demo/test_file.csv
> 
> total parse time (sec): 2.443824
> 
> ·         What does the total parse time value mean ?
> 
> ·         How is it calculated ?
> 
> ·         Is this poor performance?
> 
> min rate (files/sec): 1.535568
> 
> ·         What is the min rate (files/sec)  What does this mean ?
> 
> max rate (files/sec): 29.460340
> 
> ·         What is the max rate (files/sec)  What does this mean ?
> 
> avg rate (files/sec): 40.919485
> 
> ·         What is the avg rate (files/sec)  What does this mean ?
> 
> 
> 
> ·         Do you have any suggestions how to improve parse/unparsed speed on an ARM processor?
> 
> 
> 
> ·         Any suggestions are greatly appreciated!
> 
> 
> 
> 
> 
> 
> 
> # ./daffodil performance -s demo/csv.dfdl.xsd -N 200 -t 5 demo/test_file.csv
> 
> total parse time (sec): 3.175893
> 
> min rate (files/sec): 1.520884
> 
> max rate (files/sec): 107.223428
> 
> avg rate (files/sec): 62.974409
> 
> 
> 
> # ./daffodil performance -s demo/csv.dfdl.xsd -N 300 -t 5 demo/test_file.csv
> 
> total parse time (sec): 3.656587
> 
> min rate (files/sec): 1.551273
> 
> max rate (files/sec): 180.155186
> 
> avg rate (files/sec): 82.043712
> 
> 
> 
> 
> 
> # ./daffodil performance -s demo/csv.dfdl.xsd -N 1000 -t 5 demo/test_file.csv
> 
> total parse time (sec): 5.602554
> 
> min rate (files/sec): 1.459977
> 
> max rate (files/sec): 301.144046
> 
> avg rate (files/sec): 178.490026
> 
> 
> 
> 
> 
> 
> 
> Sincerely,
> 
> 
> 
> Rob Rose
> 
> Sr. Principal Software Engineer
> 
> General Dynamics Mission Systems
> 
> Office: 508-880-1866
> 
> Cell:      508-341-5216
> 
> 
> 
> This message and/or attachments may include information subject to GD Corporate Policies 07-103 and 07-105 and is intended to be accessed only by authorized recipients.  Use, storage and transmission are governed by General Dynamics and its policies. Contractual restrictions apply to third parties.  Recipients should refer to the policies or contract to determine proper handling.  Unauthorized review, use, disclosure or distribution is prohibited.  If you are not an intended recipient, please contact the sender and destroy all copies of the original message.
> 
> 
>

Re: CLI Performance usage...

Posted by "Sloane, Brandon" <bs...@tresys.com>.

I am not familiar with how daffodil's performance stats are reported (particularly how the average rate is faster then the max rate).

However, the biggest bottlenecks for Daffodil performance is schema compilation. If performance is a concern, I would recommend pre-compiling your parser using the `daffodil save-parser` command. You can then use the pre-compiled parser using the '-P' flag instead of '-s'. Note that Daffodil does not have a stable format for pre-compiled parsers, so the Daffodil version used to save the parser would need to match the version used to run it.

A similar issue (which wouldn't be captured by daffodil performance) is startup time. Since Daffodil runs on the JVM, just starting it takes a substantial amount of time (`time daffodil --help` is about 800ms on my development system). On your actual system, I would suggest either using daffodil in stream mode, or using it as a library as part of a long-lived process. If you do either of these, them pre-compiling would help reduce your startup time, but would not offer any additional benefits to throughput.
________________________________
From: Rose, Rob P <Ro...@gd-ms.com>
Sent: Friday, November 8, 2019 10:45 AM
To: dev@daffodil.apache.org <de...@daffodil.apache.org>
Cc: Hanna, Maria <Ma...@gd-ms.com>
Subject: CLI Performance usage...

All,

                I am trying to port the Apache daffodil libraries onto an cross domain guard that runs in a very small form factor.

                We have cross compiled OpenJDK 12 for the aarch64 (ARM processor) and loaded into memory.

                I have built the source using sbt (sbt daffodil-cli/stage) and loaded the necessary jars into memory on the board.

                Here are some of the specifics of the hardware platform running on this guard:

·         2 GB DDR RAM

o   Memory Management Unit (MMU) Page Tables used in this system are one-to-one mapping.

·         ARM Cortex A53 4 Core Processor

Here are some the specifics for the software components

·         SELinux

·         Busybox

Here is some of the performance numbers we are seeing from the performance testing:

                NOTE:  These tests were run using the attached csv file and the attached schema

# ./daffodil performance -s demo/csv.dfdl.xsd -N 100 -t 5 demo/test_file.csv

total parse time (sec): 2.443824

·         What does the total parse time value mean ?

·         How is it calculated ?

·         Is this poor performance?

min rate (files/sec): 1.535568

·         What is the min rate (files/sec)  What does this mean ?

max rate (files/sec): 29.460340

·         What is the max rate (files/sec)  What does this mean ?

avg rate (files/sec): 40.919485

·         What is the avg rate (files/sec)  What does this mean ?

·         Do you have any suggestions how to improve parse/unparsed speed on an ARM processor?

·         Any suggestions are greatly appreciated!

# ./daffodil performance -s demo/csv.dfdl.xsd -N 200 -t 5 demo/test_file.csv

total parse time (sec): 3.175893

min rate (files/sec): 1.520884

max rate (files/sec): 107.223428

avg rate (files/sec): 62.974409

# ./daffodil performance -s demo/csv.dfdl.xsd -N 300 -t 5 demo/test_file.csv

total parse time (sec): 3.656587

min rate (files/sec): 1.551273

max rate (files/sec): 180.155186

avg rate (files/sec): 82.043712

# ./daffodil performance -s demo/csv.dfdl.xsd -N 1000 -t 5 demo/test_file.csv

total parse time (sec): 5.602554

min rate (files/sec): 1.459977

max rate (files/sec): 301.144046

avg rate (files/sec): 178.490026

Sincerely,

Rob Rose

Sr. Principal Software Engineer

General Dynamics Mission Systems

Office: 508-880-1866

Cell:      508-341-5216

This message and/or attachments may include information subject to GD Corporate Policies 07-103 and 07-105 and is intended to be accessed only by authorized recipients.  Use, storage and transmission are governed by General Dynamics and its policies. Contractual restrictions apply to third parties.  Recipients should refer to the policies or contract to determine proper handling.  Unauthorized review, use, disclosure or distribution is prohibited.  If you are not an intended recipient, please contact the sender and destroy all copies of the original message.