You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@daffodil.apache.org by "Steve Lawrence (JIRA)" <ji...@apache.org> on 2018/07/17 14:39:00 UTC

[jira] [Updated] (DAFFODIL-1967) Support --stream option for CLI unparse subcommands

     [ https://issues.apache.org/jira/browse/DAFFODIL-1967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Steve Lawrence updated DAFFODIL-1967:
-------------------------------------
    Description: 
The --stream option for the parse subcommand currently outputs repetitions of XML data, e.g.
{code:xml}
<foo>...</foo>
<foo>...</foo>
<foo>...</foo>
{code}

Since there is no root element, this is not valid XML data and the libraries we use to parse the XML string throw and error. So, in order support streaming XML data into unparsing we need to manually split the XML before giving it to the XML parsing libraries. Parsing XML ourselves and just not caring about not having a root element is an option, but might be more effort than it is worth. Another option is to output some sort of delimiter in between each XML and just split the data on that. Extra care needs to be done to ensure that we do not split if the XML content contains that delimiter.

  was:
The --stream option was only implemented for the parse CLI subcommand. Ideally, this would work for both unparse and performance subcommands as well. The issues with each are:

*Unparse:*

The --stream option for the parse subcommand currently outputs repetitions of XML data, e.g.
{code:xml}
<foo>...</foo>
<foo>...</foo>
<foo>...</foo>
{code}

Since there is no root element, this is not valid XML data and the libraries we use to parse the XML string throw and error. So, in order support streaming XML data into unparsing we need to manually split the XML before giving it to the XML parsing libraries. Parsing XML ourselves and just not caring about not having a root element is an option, but might be more effort than it is worth. Another option is to output some sort of delimiter in between each XML and just split the data on that. Extra care needs to be done to ensure that we do not split if the XML content contains that delimiter.

*Performance:*

The performance subcommand currently works by creating a ByteBuffer and just repeatedly calling parse on that. In order to test streaming performance we would need to create an InputStream and continuously provide data to it, perhaps via a PipeInput/OutputStream pair or something similar. 

     Issue Type: New Feature  (was: Bug)
        Summary: Support --stream option for CLI unparse subcommands  (was: Support --stream option for CLI unparse and performance subcommands)

> Support --stream option for CLI unparse subcommands
> ---------------------------------------------------
>
>                 Key: DAFFODIL-1967
>                 URL: https://issues.apache.org/jira/browse/DAFFODIL-1967
>             Project: Daffodil
>          Issue Type: New Feature
>          Components: CLI
>    Affects Versions: 2.2.0
>            Reporter: Steve Lawrence
>            Assignee: Steve Lawrence
>            Priority: Major
>             Fix For: 2.2.0
>
>
> The --stream option for the parse subcommand currently outputs repetitions of XML data, e.g.
> {code:xml}
> <foo>...</foo>
> <foo>...</foo>
> <foo>...</foo>
> {code}
> Since there is no root element, this is not valid XML data and the libraries we use to parse the XML string throw and error. So, in order support streaming XML data into unparsing we need to manually split the XML before giving it to the XML parsing libraries. Parsing XML ourselves and just not caring about not having a root element is an option, but might be more effort than it is worth. Another option is to output some sort of delimiter in between each XML and just split the data on that. Extra care needs to be done to ensure that we do not split if the XML content contains that delimiter.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)