You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by Tilman Hausherr <TH...@t-online.de> on 2015/07/07 20:11:18 UTC
Re: FW: xmp parsing issue -- xmp should start with a processing
instruction
Hi,
We got more restrictive in 2.0 after doing the Bavaria pdfa tests. That
file is missing "<?xpacket " at the beginning.
Tilman
Am 07.07.2015 um 20:07 schrieb Allison, Timothy B.:
> All,
> This is a separate issue than I raised in PDFBox-2855. This, too, was initially noted by Jeremy Anderson on TIKA-1285. I'm not sure if this is a problem with the way our xmp was generated or with the xmp parser. I'm fairly confident the former, but wanted to check.
>
> In our test suite, we have a file that is intended to test multi-lingual titles in xmp. I _think_ we generated this file with an older version of PDFBox+jempbox (vintage 1.6???), but I can't remember any more.
>
> The XMP is:
>
> <x:xmpmeta xmlns:x="adobe:ns:meta/">
> <rdf:RDF xmlns:foaf="http://xmlns.com/foaf/0.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
>
> <rdf:Description xmlns:dc="http://purl.org/dc/elements/1.1/" rdf:about="">
> <dc:creator rdf:resource="mailto:plindenbaum@yahoo.fr"/>
> <dc:title>
> <rdf:Alt>
> <rdf:li xml:lang="x-default">Hello World</rdf:li>
> <rdf:li xml:lang="fr-ca">Bonjour World</rdf:li>
> <rdf:li xml:lang="zh-cn">????</rdf:li>
> </rdf:Alt>
> </dc:title>
> <dc:date>2010-07-11</dc:date>
> </rdf:Description>
> </rdf:RDF>
> </x:xmpmeta>
>
> The stacktrace is:
> org.apache.xmpbox.xml.XmpParsingException: xmp should start with a processing instruction
> at org.apache.xmpbox.xml.DomXmpParser.parse(DomXmpParser.java:135)
> at org.apache.tika.parser.pdf.PDFParser.extractMetadata(PDFParser.java:207)
> at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:127)
>
>
> The original file PDF file is available here: http://svn.apache.org/repos/asf/tika/trunk/tika-parsers/src/test/resources/test-documents/testPDFTripleLangTitle.pdf
>
> This file was parsed without a problem by jempbox, but we get the exception in 2.0.0. Should I open an issue for this or is this user error, and we need to regenerate our test file to yield correct xmp?
>
> Thank you.
>
> Best,
>
> Tim
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org
RE: FW: xmp parsing issue -- xmp should start with a processing
instruction
Posted by "Allison, Timothy B." <ta...@mitre.org>.
Thank you, Tilman. Will regenerate new test file.
-----Original Message-----
From: Tilman Hausherr [mailto:THausherr@t-online.de]
Sent: Tuesday, July 07, 2015 2:11 PM
To: users@pdfbox.apache.org
Subject: Re: FW: xmp parsing issue -- xmp should start with a processing instruction
Hi,
We got more restrictive in 2.0 after doing the Bavaria pdfa tests. That
file is missing "<?xpacket " at the beginning.
Tilman
Am 07.07.2015 um 20:07 schrieb Allison, Timothy B.:
> All,
> This is a separate issue than I raised in PDFBox-2855. This, too, was initially noted by Jeremy Anderson on TIKA-1285. I'm not sure if this is a problem with the way our xmp was generated or with the xmp parser. I'm fairly confident the former, but wanted to check.
>
> In our test suite, we have a file that is intended to test multi-lingual titles in xmp. I _think_ we generated this file with an older version of PDFBox+jempbox (vintage 1.6???), but I can't remember any more.
>
> The XMP is:
>
> <x:xmpmeta xmlns:x="adobe:ns:meta/">
> <rdf:RDF xmlns:foaf="http://xmlns.com/foaf/0.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
>
> <rdf:Description xmlns:dc="http://purl.org/dc/elements/1.1/" rdf:about="">
> <dc:creator rdf:resource="mailto:plindenbaum@yahoo.fr"/>
> <dc:title>
> <rdf:Alt>
> <rdf:li xml:lang="x-default">Hello World</rdf:li>
> <rdf:li xml:lang="fr-ca">Bonjour World</rdf:li>
> <rdf:li xml:lang="zh-cn">????</rdf:li>
> </rdf:Alt>
> </dc:title>
> <dc:date>2010-07-11</dc:date>
> </rdf:Description>
> </rdf:RDF>
> </x:xmpmeta>
>
> The stacktrace is:
> org.apache.xmpbox.xml.XmpParsingException: xmp should start with a processing instruction
> at org.apache.xmpbox.xml.DomXmpParser.parse(DomXmpParser.java:135)
> at org.apache.tika.parser.pdf.PDFParser.extractMetadata(PDFParser.java:207)
> at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:127)
>
>
> The original file PDF file is available here: http://svn.apache.org/repos/asf/tika/trunk/tika-parsers/src/test/resources/test-documents/testPDFTripleLangTitle.pdf
>
> This file was parsed without a problem by jempbox, but we get the exception in 2.0.0. Should I open an issue for this or is this user error, and we need to regenerate our test file to yield correct xmp?
>
> Thank you.
>
> Best,
>
> Tim
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org