You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by Jörg Lenhard <jo...@uni-bamberg.de> on 2016/04/11 16:32:56 UTC

Parsing xmp with custom namespaces with xmpbox

Dear pdfbox team,

I am working on an application (JabRef) that has so far relied on pdfbox 
and jempbox for pdf and xmp handling. I am trying to update to pdfbox 
2.0.0 and to replace jempbox with xmpbox and am having some trouble.

The problem is that we are encoding custom non-standard data into the 
xmp meta data, which can be serialized just fine with jempbox (using the 
class XMLUtil). I can serialize the data just as well with xmpbox (using 
XmpSerializer), but fail to get parsing working (with DomXmpParser). 
Maybe I am overlooking something very trivial or maybe xmpbox does not 
support custom namespaces? Is there any way in which I can make 
DomXmpParser aware of my custom namespace?

Below please find some example data, which I would like to be able to 
parse and the resulting stacktrace. The exception is: 
org.apache.xmpbox.xml.XmpParsingException: Cannot find a definition for 
the namespace http://jabref.sourceforge.net/bibteXMP/

My code for parsing is:
// testData is a String with xmp data
InputStream is = new 
ByteArrayInputStream(testData.getBytes(StandardCharsets.UTF_8));
DomXmpParser parser = new DomXmpParser();
XMPMetadata meta = parser.parse(is);

Any help will be very much appreciated.

Best regards

Jörg Lenhard

<?xpacket begin="" id="W5M0MpCehiHzreSzNTczkc9d"?><x:xmpmeta 
xmlns:x="adobe:ns:meta/">
   <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
     <rdf:Description xmlns:dc="http://purl.org/dc/elements/1.1/" 
rdf:about="">
       <dc:description>
         <rdf:Alt>
           <rdf:li xml:lang="x-default">The success of the Linux 
operating system has demonstrated the viability of an alternative form 
of software development � open source software � that challenges 
traditional assumptions about software markets. Understanding what 
drives open source developers to participate in open source projects is 
crucial for assessing the impact of open source software. This article 
identifies two broad types of motivations that account for their 
participation in open source projects. The first category includes 
internal factors such as intrinsic motivation and altruism, and the 
second category focuses on external rewards such as expected future 
returns and personal needs. This article also reports the results of a 
survey administered to open source programmers.</rdf:li>
         </rdf:Alt>
       </dc:description>
       <dc:creator>
         <rdf:Seq>
           <rdf:li>Kelly Clarkson</rdf:li>
           <rdf:li>Ozzy Osbourne</rdf:li>
         </rdf:Seq>
       </dc:creator>
       <dc:relation>
         <rdf:Bag>
           <rdf:li>bibtex/bibtexkey/Clarkson06</rdf:li>
           <rdf:li>bibtex/booktitle/Catch-22</rdf:li>
           <rdf:li>bibtex/journal/International Journal of High 
Fidelity</rdf:li>
           <rdf:li>bibtex/pdf/YeKis03 - Towards.pdf</rdf:li>
         </rdf:Bag>
       </dc:relation>
       <dc:contributor>
         <rdf:Bag>
           <rdf:li>Huey Duck</rdf:li>
           <rdf:li>Dewey Duck</rdf:li>
           <rdf:li>Louie Duck</rdf:li>
         </rdf:Bag>
       </dc:contributor>
       <dc:subject>
         <rdf:Bag>
           <rdf:li>peanut</rdf:li>
           <rdf:li>butter</rdf:li>
           <rdf:li>jelly</rdf:li>
         </rdf:Bag>
       </dc:subject>
       <dc:title>
         <rdf:Alt>
           <rdf:li xml:lang="x-default">Hypersonic ultra-sound</rdf:li>
         </rdf:Alt>
       </dc:title>
       <dc:date>
         <rdf:Seq>
           <rdf:li>1982-07</rdf:li>
         </rdf:Seq>
       </dc:date>
       <dc:format>application/pdf</dc:format>
       <dc:type>
         <rdf:Bag>
           <rdf:li>InProceedings</rdf:li>
         </rdf:Bag>
       </dc:type>
     </rdf:Description>
     <rdf:Description 
xmlns:bibtex="http://jabref.sourceforge.net/bibteXMP/" rdf:about="">
       <bibtex:abstract>The success of the Linux operating system has 
demonstrated the viability of an alternative form of software 
development � open source software � that challenges traditional 
assumptions about software markets. Understanding what drives open 
source developers to participate in open source projects is crucial for 
assessing the impact of open source software. This article identifies 
two broad types of motivations that account for their participation in 
open source projects. The first category includes internal factors such 
as intrinsic motivation and altruism, and the second category focuses on 
external rewards such as expected future returns and personal needs. 
This article also reports the results of a survey administered to open 
source programmers.</bibtex:abstract>
       <bibtex:author>
         <rdf:Seq>
           <rdf:li>Kelly Clarkson</rdf:li>
           <rdf:li>Ozzy Osbourne</rdf:li>
         </rdf:Seq>
       </bibtex:author>
       <bibtex:bibtexkey>Clarkson06</bibtex:bibtexkey>
       <bibtex:booktitle>Catch-22</bibtex:booktitle>
       <bibtex:editor>
         <rdf:Seq>
           <rdf:li>Huey Duck</rdf:li>
           <rdf:li>Dewey Duck</rdf:li>
           <rdf:li>Louie Duck</rdf:li>
         </rdf:Seq>
       </bibtex:editor>
       <bibtex:journal>International Journal of High 
Fidelity</bibtex:journal>
       <bibtex:keywords>peanut, butter, jelly</bibtex:keywords>
       <bibtex:month>#jul#</bibtex:month>
       <bibtex:pdf>YeKis03 - Towards.pdf</bibtex:pdf>
       <bibtex:title>Hypersonic ultra-sound</bibtex:title>
       <bibtex:year>1982</bibtex:year>
       <bibtex:entrytype>inproceedings</bibtex:entrytype>
     </rdf:Description>
   </rdf:RDF>
</x:xmpmeta><?xpacket end="w"?>

The exception:

Cannot find a definition for the namespace 
http://jabref.sourceforge.net/bibteXMP/
org.apache.xmpbox.xml.XmpParsingException: Cannot find a definition for 
the namespace http://jabref.sourceforge.net/bibteXMP/
     at 
org.apache.xmpbox.xml.DomXmpParser.checkPropertyDefinition(DomXmpParser.java:853)
     at 
org.apache.xmpbox.xml.DomXmpParser.parseChildrenAsProperties(DomXmpParser.java:290)
     at 
org.apache.xmpbox.xml.DomXmpParser.parseDescriptionRoot(DomXmpParser.java:234)
     at org.apache.xmpbox.xml.DomXmpParser.parse(DomXmpParser.java:198)
     at 
net.sf.jabref.logic.xmp.XMPUtilTest.testReadWriteDC(XMPUtilTest.java:902)
     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
     at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
     at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
     at java.lang.reflect.Method.invoke(Method.java:497)
     at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
     at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
     at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
     at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
     at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
     at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
     at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
     at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
     at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
     at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
     at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
     at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
     at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
     at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
     at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
     at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecuter.runTestClass(JUnitTestClassExecuter.java:112)
     at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecuter.execute(JUnitTestClassExecuter.java:56)
     at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassProcessor.processTestClass(JUnitTestClassProcessor.java:66)
     at 
org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51)
     at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
     at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
     at java.lang.reflect.Method.invoke(Method.java:497)
     at 
org.gradle.messaging.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
     at 
org.gradle.messaging.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
     at 
org.gradle.messaging.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:32)
     at 
org.gradle.messaging.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:93)
     at com.sun.proxy.$Proxy2.processTestClass(Unknown Source)
     at 
org.gradle.api.internal.tasks.testing.worker.TestWorker.processTestClass(TestWorker.java:109)
     at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
     at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
     at java.lang.reflect.Method.invoke(Method.java:497)
     at 
org.gradle.messaging.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
     at 
org.gradle.messaging.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
     at 
org.gradle.messaging.remote.internal.hub.MessageHub$Handler.run(MessageHub.java:360)
     at 
org.gradle.internal.concurrent.ExecutorPolicy$CatchAndRecordFailures.onExecute(ExecutorPolicy.java:54)
     at 
org.gradle.internal.concurrent.StoppableExecutorImpl$1.run(StoppableExecutorImpl.java:40)
     at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
     at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
     at java.lang.Thread.run(Thread.java:745)



Re: Parsing xmp with custom namespaces with xmpbox

Posted by Tilman Hausherr <TH...@t-online.de>.
Hi,

This is a known problem, yes xmpbox does not support custom namespaces, 
this was noticed too late (xmpbox is closely related to preflight, which 
checks for PDF/A). It is on the list of things to discuss for 2.1

"- discussion/decision on XMP (shall we enhance XMPBox, restore Jempbox, 
base on Adobe's XMP library, join forces with the FOP project …)"

Until then, the workaround is to keep using jempbox.

Tilman


Am 11.04.2016 um 16:32 schrieb Jörg Lenhard:
> Dear pdfbox team,
>
> I am working on an application (JabRef) that has so far relied on 
> pdfbox and jempbox for pdf and xmp handling. I am trying to update to 
> pdfbox 2.0.0 and to replace jempbox with xmpbox and am having some 
> trouble.
>
> The problem is that we are encoding custom non-standard data into the 
> xmp meta data, which can be serialized just fine with jempbox (using 
> the class XMLUtil). I can serialize the data just as well with xmpbox 
> (using XmpSerializer), but fail to get parsing working (with 
> DomXmpParser). Maybe I am overlooking something very trivial or maybe 
> xmpbox does not support custom namespaces? Is there any way in which I 
> can make DomXmpParser aware of my custom namespace?
>
> Below please find some example data, which I would like to be able to 
> parse and the resulting stacktrace. The exception is: 
> org.apache.xmpbox.xml.XmpParsingException: Cannot find a definition 
> for the namespace http://jabref.sourceforge.net/bibteXMP/
>
> My code for parsing is:
> // testData is a String with xmp data
> InputStream is = new 
> ByteArrayInputStream(testData.getBytes(StandardCharsets.UTF_8));
> DomXmpParser parser = new DomXmpParser();
> XMPMetadata meta = parser.parse(is);
>
> Any help will be very much appreciated.
>
> Best regards
>
> Jörg Lenhard
>
> <?xpacket begin="" id="W5M0MpCehiHzreSzNTczkc9d"?><x:xmpmeta 
> xmlns:x="adobe:ns:meta/">
>   <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
>     <rdf:Description xmlns:dc="http://purl.org/dc/elements/1.1/" 
> rdf:about="">
>       <dc:description>
>         <rdf:Alt>
>           <rdf:li xml:lang="x-default">The success of the Linux 
> operating system has demonstrated the viability of an alternative form 
> of software development � open source software � that challenges 
> traditional assumptions about software markets. Understanding what 
> drives open source developers to participate in open source projects 
> is crucial for assessing the impact of open source software. This 
> article identifies two broad types of motivations that account for 
> their participation in open source projects. The first category 
> includes internal factors such as intrinsic motivation and altruism, 
> and the second category focuses on external rewards such as expected 
> future returns and personal needs. This article also reports the 
> results of a survey administered to open source programmers.</rdf:li>
>         </rdf:Alt>
>       </dc:description>
>       <dc:creator>
>         <rdf:Seq>
>           <rdf:li>Kelly Clarkson</rdf:li>
>           <rdf:li>Ozzy Osbourne</rdf:li>
>         </rdf:Seq>
>       </dc:creator>
>       <dc:relation>
>         <rdf:Bag>
>           <rdf:li>bibtex/bibtexkey/Clarkson06</rdf:li>
>           <rdf:li>bibtex/booktitle/Catch-22</rdf:li>
>           <rdf:li>bibtex/journal/International Journal of High 
> Fidelity</rdf:li>
>           <rdf:li>bibtex/pdf/YeKis03 - Towards.pdf</rdf:li>
>         </rdf:Bag>
>       </dc:relation>
>       <dc:contributor>
>         <rdf:Bag>
>           <rdf:li>Huey Duck</rdf:li>
>           <rdf:li>Dewey Duck</rdf:li>
>           <rdf:li>Louie Duck</rdf:li>
>         </rdf:Bag>
>       </dc:contributor>
>       <dc:subject>
>         <rdf:Bag>
>           <rdf:li>peanut</rdf:li>
>           <rdf:li>butter</rdf:li>
>           <rdf:li>jelly</rdf:li>
>         </rdf:Bag>
>       </dc:subject>
>       <dc:title>
>         <rdf:Alt>
>           <rdf:li xml:lang="x-default">Hypersonic ultra-sound</rdf:li>
>         </rdf:Alt>
>       </dc:title>
>       <dc:date>
>         <rdf:Seq>
>           <rdf:li>1982-07</rdf:li>
>         </rdf:Seq>
>       </dc:date>
>       <dc:format>application/pdf</dc:format>
>       <dc:type>
>         <rdf:Bag>
>           <rdf:li>InProceedings</rdf:li>
>         </rdf:Bag>
>       </dc:type>
>     </rdf:Description>
>     <rdf:Description 
> xmlns:bibtex="http://jabref.sourceforge.net/bibteXMP/" rdf:about="">
>       <bibtex:abstract>The success of the Linux operating system has 
> demonstrated the viability of an alternative form of software 
> development � open source software � that challenges traditional 
> assumptions about software markets. Understanding what drives open 
> source developers to participate in open source projects is crucial 
> for assessing the impact of open source software. This article 
> identifies two broad types of motivations that account for their 
> participation in open source projects. The first category includes 
> internal factors such as intrinsic motivation and altruism, and the 
> second category focuses on external rewards such as expected future 
> returns and personal needs. This article also reports the results of a 
> survey administered to open source programmers.</bibtex:abstract>
>       <bibtex:author>
>         <rdf:Seq>
>           <rdf:li>Kelly Clarkson</rdf:li>
>           <rdf:li>Ozzy Osbourne</rdf:li>
>         </rdf:Seq>
>       </bibtex:author>
>       <bibtex:bibtexkey>Clarkson06</bibtex:bibtexkey>
>       <bibtex:booktitle>Catch-22</bibtex:booktitle>
>       <bibtex:editor>
>         <rdf:Seq>
>           <rdf:li>Huey Duck</rdf:li>
>           <rdf:li>Dewey Duck</rdf:li>
>           <rdf:li>Louie Duck</rdf:li>
>         </rdf:Seq>
>       </bibtex:editor>
>       <bibtex:journal>International Journal of High 
> Fidelity</bibtex:journal>
>       <bibtex:keywords>peanut, butter, jelly</bibtex:keywords>
>       <bibtex:month>#jul#</bibtex:month>
>       <bibtex:pdf>YeKis03 - Towards.pdf</bibtex:pdf>
>       <bibtex:title>Hypersonic ultra-sound</bibtex:title>
>       <bibtex:year>1982</bibtex:year>
> <bibtex:entrytype>inproceedings</bibtex:entrytype>
>     </rdf:Description>
>   </rdf:RDF>
> </x:xmpmeta><?xpacket end="w"?>
>
> The exception:
>
> Cannot find a definition for the namespace 
> http://jabref.sourceforge.net/bibteXMP/
> org.apache.xmpbox.xml.XmpParsingException: Cannot find a definition 
> for the namespace http://jabref.sourceforge.net/bibteXMP/
>     at 
> org.apache.xmpbox.xml.DomXmpParser.checkPropertyDefinition(DomXmpParser.java:853)
>     at 
> org.apache.xmpbox.xml.DomXmpParser.parseChildrenAsProperties(DomXmpParser.java:290)
>     at 
> org.apache.xmpbox.xml.DomXmpParser.parseDescriptionRoot(DomXmpParser.java:234)
>     at org.apache.xmpbox.xml.DomXmpParser.parse(DomXmpParser.java:198)
>     at 
> net.sf.jabref.logic.xmp.XMPUtilTest.testReadWriteDC(XMPUtilTest.java:902)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>     at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:497)
>     at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>     at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>     at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>     at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>     at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>     at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>     at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>     at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>     at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>     at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>     at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>     at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>     at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>     at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>     at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>     at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecuter.runTestClass(JUnitTestClassExecuter.java:112)
>     at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecuter.execute(JUnitTestClassExecuter.java:56)
>     at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassProcessor.processTestClass(JUnitTestClassProcessor.java:66)
>     at 
> org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51)
>     at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
>     at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:497)
>     at 
> org.gradle.messaging.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
>     at 
> org.gradle.messaging.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
>     at 
> org.gradle.messaging.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:32)
>     at 
> org.gradle.messaging.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:93)
>     at com.sun.proxy.$Proxy2.processTestClass(Unknown Source)
>     at 
> org.gradle.api.internal.tasks.testing.worker.TestWorker.processTestClass(TestWorker.java:109)
>     at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
>     at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:497)
>     at 
> org.gradle.messaging.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
>     at 
> org.gradle.messaging.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
>     at 
> org.gradle.messaging.remote.internal.hub.MessageHub$Handler.run(MessageHub.java:360)
>     at 
> org.gradle.internal.concurrent.ExecutorPolicy$CatchAndRecordFailures.onExecute(ExecutorPolicy.java:54)
>     at 
> org.gradle.internal.concurrent.StoppableExecutorImpl$1.run(StoppableExecutorImpl.java:40)
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>     at java.lang.Thread.run(Thread.java:745)
>
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org