You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@daffodil.apache.org by "Dave Thompson (Jira)" <ji...@apache.org> on 2020/09/24 20:16:00 UTC

[jira] [Closed] (DAFFODIL-934) Streaming parser: Need to stream input data in, and infoset out to handle arbitrarily large data.

     [ https://issues.apache.org/jira/browse/DAFFODIL-934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dave Thompson closed DAFFODIL-934.
----------------------------------

Verified the specified commit (commit b0f59ef7c8b3a1088183c40691df7f3fd10ff864 ) is included in the latest pull from the incubator-daffodil repository.

Verified the effected incubator-daffodil sbt test suites execute successfully.

Verified the nightly test schemas compile and save successfully.

Verified the nightly test suite executes successfully.

Note: although the nightly performance test suite executed successfully, performance degradation occurred during the parse tests across the data formats, some significant.

JIRA ticket DAFFODIL-2396 was created to address the performance degradation.

> Streaming parser: Need to stream input data in, and infoset out to handle arbitrarily large data.
> -------------------------------------------------------------------------------------------------
>
>                 Key: DAFFODIL-934
>                 URL: https://issues.apache.org/jira/browse/DAFFODIL-934
>             Project: Daffodil
>          Issue Type: Improvement
>          Components: Performance
>    Affects Versions: s13
>            Reporter: Mike Beckerle
>            Assignee: Steve Lawrence
>            Priority: Major
>             Fix For: 3.0.0
>
>
> Currently Daffodil requires that all incoming data fit in one java.nio.ByteBuffer. A separate issue (DFDL-881) is about allowing > 4GB files, but data sizes would still be limited by available address space.
> A streaming approach has great advantages. It requires that the input can be streamed in (e.g., from a java.io.InputStream), but also requires that the DFDL Infoset can be streamed out. (Think SAX parser 'events' coming out).  This is complicated by the DFDL notion of points of uncertainty. E.g., until a choice branch has been resolved none of the elements on any branch can be emitted since "backtracking" may invalidate them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)