You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@daffodil.apache.org by "Mike Beckerle (Jira)" <ji...@apache.org> on 2019/12/16 16:40:03 UTC

[jira] [Commented] (DAFFODIL-2254) BLOB support for scanning for end of blob

    [ https://issues.apache.org/jira/browse/DAFFODIL-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16997443#comment-16997443 ] 

Mike Beckerle commented on DAFFODIL-2254:
-----------------------------------------

It is unclear if our interpretation of lengthKind 'delimited' as "delimited OR end of data", is compliant with the spec.  I doubt that it is actually. So we should not depend on that. 

I think the only case in DFDL spec where a delimiter is allowed to be not present when lengthKind is 'delimited' is dfdl:documentFinalTerminatorCanBeMissing. 

That won't work for blobs, because you have to be scanning for a delimiter that you are allowed to not find at the end, but you would be scanning for it in binary data anyway, so you might find it in the middle somewhere. 

The only thing in DFDL today that matches this semantic is dfdl:lengthKind 'endOfParent' which is for exactly this sort of thing. 

We should consider whether implementing this is very hard. If it isn't we should implement that. If it is, maybe just for BLOBs as a special case is possible as an initial implementation?








> BLOB support for scanning for end of blob
> -----------------------------------------
>
>                 Key: DAFFODIL-2254
>                 URL: https://issues.apache.org/jira/browse/DAFFODIL-2254
>             Project: Daffodil
>          Issue Type: New Feature
>          Components: Back End
>    Affects Versions: 2.5.0
>            Reporter: Steve Lawrence
>            Priority: Major
>
> To keep the initial BLOB imlementation relatively simple, it only supports lengthKind="explicit". However, some formats may need a scanning mechanism to find the end of the BLOB. One example of this is RPM, which has a compressed payload at the end of the file, and the length is just everything from one point to the end of the file. We ideally want to treat this payload as a blob, but since there is no explicit length for it, some sort of scanning must occurr.
> Some initial thoughts for potential solutions:
> 1. lengthKind="delimited" allows for the end of the data stream to be a delimiter. So we could just make changes to delimiter scanning to support outputting the field to a blob file rather than somewhere in internal memory. This would allow for supporting delimited blobs in cases more than just end of stream, but could be some what challenging of a change.
> 2. Support endOfParent for blobs, or perhaps a new lengthKind, or a blob specific property that can be used for end of data. The logic here would likely be much simpleer than delimier scanning, but is pretty single purpose. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)