You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@daffodil.apache.org by "Mike Beckerle (Jira)" <ji...@apache.org> on 2020/06/10 13:49:00 UTC

[jira] [Commented] (DAFFODIL-2351) layer improvements to enable JPEG format

    [ https://issues.apache.org/jira/browse/DAFFODIL-2351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17130689#comment-17130689 ] 

Mike Beckerle commented on DAFFODIL-2351:
-----------------------------------------

See also DAFFODIL-1927 extensible layer transforms feature.
There are numerous other tickets about layering. Search for tickets on "layer" to find them. 

> layer improvements to enable JPEG format
> ----------------------------------------
>
>                 Key: DAFFODIL-2351
>                 URL: https://issues.apache.org/jira/browse/DAFFODIL-2351
>             Project: Daffodil
>          Issue Type: Bug
>          Components: Back End
>    Affects Versions: 2.6.0
>            Reporter: Mike Beckerle
>            Priority: Major
>             Fix For: 3.0.0
>
>
> JPEG format has "Entropy Coded Segments" or ECS Segments.
> These are terminated by the byte-pattern that indicates the start of the following JPEG segment, so we need the ability to isolate these bytes by finding, but not consuming, the start of the next segment. 
> Currently the only way to do this is with lengthKind='pattern', and a regex with lookahead. This is problematic due to the way the implementation of regex scanning works (buffers that are gradually enlarged if needed).  The buffers cannot be made big enough and this will simply not work for JPEG's with very large images (JPEG2000 format has the same problem and holds even larger images). 
> The ability to define a layer that contains data up to, but not including, a particular marker is needed. In JPEG the marker is a 2-byte sequence.
> In addition, for JPEG, these ECS segments are "byte stuffed", which is an escaping scheme where if the first byte of the marker is found in the data it is modified by inserting a zero byte after it so that it does not match the marker. This inserted zero needs to be removed from the data on parsing, and re-inserted on unparsing by the layer transform. 
> Finally, all the implementation of this feature needs to not require staging a copy of the entire contents of the ECS segment in any array, so long as the ultimate destination of the bytes is as a DFDL BLOB (extension to DFDL v1.0). These layers need to allow streaming the bytes of the ECS segment out to an external BLOB (e.g., a BLOB file) without the need to create any object in the Daffodil process memory that is the size of the whole ECS segment. 
>    
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)