You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@daffodil.apache.org by "Mike Beckerle (Jira)" <ji...@apache.org> on 2022/03/31 20:33:00 UTC

[jira] [Updated] (DAFFODIL-2684) daffodil-cli splitParse mode

     [ https://issues.apache.org/jira/browse/DAFFODIL-2684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mike Beckerle updated DAFFODIL-2684:
------------------------------------
    Issue Type: New Feature  (was: Bug)

> daffodil-cli splitParse mode
> ----------------------------
>
>                 Key: DAFFODIL-2684
>                 URL: https://issues.apache.org/jira/browse/DAFFODIL-2684
>             Project: Daffodil
>          Issue Type: New Feature
>          Components: CLI
>    Affects Versions: 3.3.0
>            Reporter: Mike Beckerle
>            Priority: Major
>
> A common way Daffodil is used involves first splitting data off of a TCP stream or other input stream, and then handing each split (a byte array) to Daffodil to parse a single message. 
> This differs from the current CLI "streaming" mode in the way errors work. The existing streaming can't tolerate errors. Any error halts parsing the entire stream. The only way to parse an entire stream that includes a mixture of correct and malformed data is to use a DFDL schema which actually accepts even malformed data, creating elements from it. (E.g., <invalid>8929AFB3892</invalid> ) 
> But this is unnatural and adds complexity to the DFDL schema that wouldn't otherwise be needed. 
> The split-and-parse method can continue to parse the next message even after a failure to parse. The only thing that is fatal to the whole processing run is if it is not possible to meaningfully split the message from the data stream. 
> So we want a split-and-parse capability in the CLI. Such mode uses two DFDL schemas, a splitter schema (very simple), and a regular parse schema. The splitter schema just does the minimum to split a message from the stream, then parses the byte-array it gets from the split, and parses that. 
> There is no real unparser symmetric equivalent of this split-and-parse behavior. Regular streaming unparsing works. 
> The prototype of this idea is on github openDFDL examples repo splitAndParse subdir/project. This is 100% code authored by mbeckerle (Daffodil PMC) intended to contribute to Daffodil, so no issue pulling it, or parts of it into Daffodil. 
> Suggest command line like this:
> {code:java}
> daffodil parse --stream --splitterSchema filename ... other options as per parse. {code}
> When --stream is specified, the --splitterSchema option is available. If used it provides the file name of a splitter DFDL schema. 
> If the splitter DFDL schema is precompiled then the options would be
> {code:java}
> daffodil parse --stream --splitterParser binaryfilename ... other options as per parse. {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)