You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@daffodil.apache.org by "Michael Beckerle (JIRA)" <ji...@apache.org> on 2018/03/09 22:03:01 UTC

[jira] [Closed] (DAFFODIL-1808) JPEG schema accepts too many non-JPEG data files

     [ https://issues.apache.org/jira/browse/DAFFODIL-1808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael Beckerle closed DAFFODIL-1808.
--------------------------------------
       Resolution: Duplicate
    Fix Version/s:     (was: 2.2.0)

This ticket isn't about daffodil. It is about the JPEG Schema.

Ticket https://github.com/DFDLSchemas/JPEG/issues/2 replaces this ticket. 

> JPEG schema accepts too many non-JPEG data files
> ------------------------------------------------
>
>                 Key: DAFFODIL-1808
>                 URL: https://issues.apache.org/jira/browse/DAFFODIL-1808
>             Project: Daffodil
>          Issue Type: New Feature
>          Components: DFDL Schemas
>            Reporter: Michael Beckerle
>            Priority: Major
>
> The JPEG DFDL schema has the problem of being much too permissive. Just blobs of binary data can often be accepted. The schema (to date) just identifies whether the file is any collection of JPEG segments. Alas one segment type is effectively just a datablob, so many datablobs will be accepted. 
> To overcome this, additional constraint-checking is needed. This can be expressed using DFDL's dfdl:assert statements in the DFDL schema. There are two there already which enforce the first segment being a SOI segment (start of image), and the last being EOI (end of image); however, a blob of bytes between SOI and EOI would be accepted when it is clearly NOT a jpeg image.
> In some cases the constraint rules will require more expressive power than this - where true XPath query capability is required. 
> The Schematron rule language could be used. See also DFDL-1807 - for schematron - in case it proves to be needed.
> Note that this is not "validation" of the data, it is using what we normally think of as a validation language, but using it for checking if the data is well-formed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)