You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@daffodil.apache.org by "Steve Lawrence (JIRA)" <ji...@apache.org> on 2018/01/24 17:37:00 UTC

[jira] [Commented] (DAFFODIL-1878) Parse and unparse files/sec values vary significantly using compiled and saved parsers

    [ https://issues.apache.org/jira/browse/DAFFODIL-1878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16337944#comment-16337944 ] 

Steve Lawrence commented on DAFFODIL-1878:
------------------------------------------

I am unable to find an obvious cause of the performance speed up of using saved parsers vs not. Profiling shows that runtime memory allocations and CPU call paths/times appears to be approximately the same.

However, one noticeable difference is that when not using a saved parser, many many more objects exist that cannot be garbage collected. One noticeable cause (there may be others) is that many of the *RuntimeData classes reference classes that are only needed for schema compilation in transient parameters (the parameters are often functions that reference schema compilation objects like dsom, which references grammar, etc.). Because these variables are transient, they go away when we serialize them. But when not serialized, the references still exist and cannot be garbage collected. My first inclination is to set the transient parameters to null after their values are evaluate in preserialize (which is always called, regardless of serialization). But this requires the parameters to be a var, and var parameters cannot be passed by name. We may need to rethink how these RuntimeData classes are implemented to prevent references to schema compilation classes and allow memory to be freed. For very complex schemas, this could cause issues if not using a saved parser.

I'm not sure if all these extra objects that cannot be garbage collected causes this particular problem, but it could be causing more time to be spent in the garbage collector or related memory issues and slowing things down.

> Parse and unparse files/sec values vary significantly using compiled and saved parsers
> --------------------------------------------------------------------------------------
>
>                 Key: DAFFODIL-1878
>                 URL: https://issues.apache.org/jira/browse/DAFFODIL-1878
>             Project: Daffodil
>          Issue Type: Bug
>          Components: General
>    Affects Versions: 2.1.0
>            Reporter: Dave Thompson
>            Assignee: Steve Lawrence
>            Priority: Major
>             Fix For: 2.1.0
>
>         Attachments: Performance report for 01-19-2018.docx
>
>
> After updating the nightly scripts to use the pre-compiled/saved parsers the result show that for some tests there are significant performance value differences between using runtime compiled parsers and the saved parsers.
> Attached is the report email. The Previous Val column is from non-saved parser run. Curr Val column is from previously compiled/saved parsers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)