You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Andreas Lehmkühler (JIRA)" <ji...@apache.org> on 2014/01/15 23:00:22 UTC

[jira] [Comment Edited] (PDFBOX-1685) Verify interpretation of rdf:about for PDF/A

    [ https://issues.apache.org/jira/browse/PDFBOX-1685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13872440#comment-13872440 ] 

Andreas Lehmkühler edited comment on PDFBOX-1685 at 1/15/14 9:59 PM:
---------------------------------------------------------------------

Added to the 1.8 branch in revisions 1558521 and 1558581


was (Author: lehmi):
Added to the 1.8 branch in revision 1558521

> Verify interpretation of rdf:about for PDF/A
> --------------------------------------------
>
>                 Key: PDFBOX-1685
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1685
>             Project: PDFBox
>          Issue Type: Task
>          Components: Preflight
>            Reporter: Maruan Sahyoun
>            Assignee: Eric Leleu
>            Priority: Minor
>             Fix For: 1.8.4, 2.0.0
>
>         Attachments: test-bfo.pdf
>
>
> There was a discussion about handling rdf:about for PDF/A validation on the PDF Associations mailing list which I'm allowed to share:
> <snip>
> In this case we have a PDF with an XMP metadata stream containing two
> <rdf:RDF> entries, one with rdf:about set to a blank string, the other with
> it set to a UUID. The PDF/A specification (ISO-19005-1:2005(E) para 6.7.2)
> simply says that the stream must conform to the "XMP specification 2004
> revision" which reads (p21):
> The rdf:about attribute on the rdf:Description element is a required
> attribute that identifies the resource whose metadata this XMP describes.
> The value of this attribute must follow URI syntax and may be either:
> ●  an empty string (as in the example above), which means that the XMP is
> physically local to the resource being described. Applications must rely on
> knowledge of the file format to correctly associate the XMP with the
> resource.
> ●  a unique instance ID that is generated every time a file is saved. The
> next section gives guidelines for creating instance IDs.
> The XMP packet must describe a single entity, and my reading of the above
> is a combination of empty-string and a unique UUID can meet this
> requirement - this is how both our software and Acrobat X and XI behave.
> However it's ambiguous, and this clause was revised in the 2012 revision
> (ISO 16684-1:2011(E) para 7.4) to this:
> If the XMP data model has an AboutURI (6.1, “XMP packets”), that same URI
> shall be the value of an rdf:about attribute in each top-level
> rdf:Description element. Otherwise, the rdf:about attributes for all top-
> level rdf:Description elements shall be present with an empty value. The
> rdf:about attribute shall not be used in more deeply nested rdf:Description
> elements.
> For compatibility with very early XMP usage, it is recommended that XMP
> readers tolerate a missing rdf:about attribute and treat it as present with
> an empty value. It is also recommended that XMP readers tolerate a mix of
> empty and non-empty rdf:about values, as long as all non-empty values are
> identical.
> Which means that an empty string and a unique UUID are technically
> incorrect, but it's recommended they be tolerated for compatibility
> purposes.
> </snip>
> I might be good to check our interpretation as
> <snip
> BFO and Acrobat X and XI think this is valid, PDFBox and
> pdf-tools.com online validator lean the other and classify this document
> as invalid.
> </snip>
> to see if we should change our interpretation. If there is new input on the pdfa.org mailinglist I'll capture it here too.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)