You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Guillaume Bailleul (JIRA)" <ji...@apache.org> on 2014/06/20 23:00:25 UTC
[jira] [Resolved] (PDFBOX-1995) AdobePDFSchema.getProducer() returns empty string

     [ https://issues.apache.org/jira/browse/PDFBOX-1995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Guillaume Bailleul resolved PDFBOX-1995.
----------------------------------------

       Resolution: Fixed
    Fix Version/s: 2.0.0

Fix and new test added in r1604276

> AdobePDFSchema.getProducer() returns empty string
> -------------------------------------------------
>
>                 Key: PDFBOX-1995
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1995
>             Project: PDFBox
>          Issue Type: Bug
>          Components: XmpBox
>    Affects Versions: 1.8.4
>            Reporter: Alexandre Garino
>            Assignee: Guillaume Bailleul
>             Fix For: 2.0.0
>
>
> I experienced this bug while PDF/A validation process. The document is not considered valid because the producer value is not in sync with PDDocumentInformation.
> {quote}
> PDDocumentInformation.getProducer() = ` ' (one space)
> AdobePDFSchema.getProducer() = `' (empty)
> {quote}
> Below the metadata extracted from the PDF document:
>  
> {quote}
> <?xpacket begin="" id="W5M0MpCehiHzreSzNTczkc9d"?>
> <x:xmpmeta xmlns:x="adobe:ns:meta/">
>     <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
>         <rdf:Description rdf:about="" xmlns:xap="http://ns.adobe.com/xap/1.0/">
>             <xap:CreatorTool>Canon </xap:CreatorTool>
>             <xap:CreateDate>2014-01-23T20:09:45+01:00</xap:CreateDate>
>         </rdf:Description>
>         <rdf:Description rdf:about=""  xmlns:pdf="http://ns.adobe.com/pdf/1.3/">
>             <pdf:Producer> </pdf:Producer>
>         </rdf:Description>
>         <rdf:Description rdf:about="" xmlns:pdfaid="http://www.aiim.org/pdfa/ns/id/">
>             <pdfaid:part>1</pdfaid:part>
>             <pdfaid:conformance>B</pdfaid:conformance>
>         </rdf:Description>
>     </rdf:RDF>
> </x:xmpmeta>
> <?xpacket end="w"?>
> {quote}
> As you can see the Producer value should be equal to ` ' (one space).
> The bug is located within the method DomXmpParser.removeComments. This method is invoked during the unmarshalling process and removes much more than comments, text nodes too! 
> I can fix (badly) MY issue by changing the code base from : 
> {quote}
>                 Text t = (Text) node;
>                 if (t.getTextContent().trim().length() == 0)
>                 {
>                     // XXX is there a better way to remove useless Text ?
>                     node.getParentNode().removeChild(node);
>                 }
> {quote}
> into : 
> {quote}
>                 Text t = (Text) node;
>                 if (t.getTextContent().startsWith("\n"))
>                 {
>                     // XXX is there a better way to remove useless Text ?
>                     node.getParentNode().removeChild(node);
>                 }
> {quote}
> But this is not a long term fix.
> IMHO, the unmarshalling process should be reworked.



--
This message was sent by Atlassian JIRA
(v6.2#6252)