You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@daffodil.apache.org by "Steve Lawrence (Jira)" <ji...@apache.org> on 2019/08/21 15:31:00 UTC

[jira] [Commented] (DAFFODIL-1444) Performance - schema compilation

    [ https://issues.apache.org/jira/browse/DAFFODIL-1444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16912412#comment-16912412 ] 

Steve Lawrence commented on DAFFODIL-1444:
------------------------------------------

Please see DAFFODIL-2192, which appears to have been broken in the incremental commit to fix this (commit ae3fba0a08cb), and also has potential implications for how we can reduce copying branches, since some logic must be branch aware.

> Performance - schema compilation
> --------------------------------
>
>                 Key: DAFFODIL-1444
>                 URL: https://issues.apache.org/jira/browse/DAFFODIL-1444
>             Project: Daffodil
>          Issue Type: Improvement
>          Components: Front End, Middle &quot;End&quot;, Performance
>            Reporter: Michael Beckerle
>            Assignee: Michael Beckerle
>            Priority: Major
>         Attachments: Daffodil-exponential-component-growth.xlsx
>
>
> Large DFDL schemas are very slow to compile.
> We could focus on speeding this up, and should get some low-hanging fruit here.
> But ultimately, a really large DFDL schema needs to be compiled in pieces. (DEBATABLE - focus should FIRST be on speeding up and reducing the massive copying that goes on. Separate compilation is a harder issue that we can defer.)
> This means we need to be able to reload a compiled schema just to restore it's parsers/unparsers and associated runtime data structures to memory so that another schema that depends on it can then be compiled. 
> DFDL schema compilation needs to be understood in order to decompose a schema into separately compilable units. THere's no point in trying to compile a schema layer by layer - a DFDL schema containing all type definitions, for example, doesn't compile to anything. There have to be top level elements in order for DFDL schema compilation to do anything.
> So given a large data format with many top-level element types, we need the compiler to recognize element references to pre-compiled top-level elements, and avoid recompiling new instances of them if the surrounding environment is the same. That is, surrounding default format specification is the same.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)