You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by Esteban R <er...@hotmail.com> on 2017/07/27 15:49:20 UTC
XMP issue when using fop and shaded jar
I'm working in a project which uses both pdfbox and apache fop (https://xmlgraphics.apache.org/fop/)
CreatePDFA (from pdfbox examples) generates different (wrong?) XMP metadata when fop is included in the shaded jar. Otherwise (i.e.: if a simple jar is generated or if fop is not included, then metadata is ok). The issue seems to be related to the services provided by fop (more details below). How can I solve it? Is it ok to exclude the services provided by fop from the shaded jar?
You can find a sample project here:
<http://www.filedropper.com/tmpxmpissue>http://www.filedropper.com/tmpxmpissue_1
To reproduce you can use try.sh -included in the project- in a cygwin (Windows 10) environment -should work on linux- or you can:
mvn clean install
java -jar target/tmp_xmp_issue-1.0-SNAPSHOT.jar out.pdf "Hello world" OpenSans-Regular.ttf
Open out.pdf with PDFDebugger and see the XMPMetadata:
<rdf:li lang="x-default">out.pdf</rdf:li>
If you don't include fop (i.e. use pom.xml.nofop) then you will get:
<rdf:li xml:lang="x-default">out.pdf</rdf:li>^M
The same will happen if you use pom.xml.noservices (which just excludes the services provided by fop).
I use java version "1.8.0_112" and pdfbox 2.0.7.
Re: XMP issue when using fop and shaded jar
Posted by Esteban R <er...@hotmail.com>.
Thanks Andreas. It doesn't seem to solve the issue but it may help. I'm a newbie in the maven world, so, sorry if I misundestood something.
I have been trying to exclude the pdfbox dependency in fop (with the <exclusion(s)> tag) :
<exclusions>
<exclusion>
<groupId>org.apache.pdfbox</groupId>
<artifactId>fontbox</artifactId>
</exclusion>
</exclusions>
but it doesn't help.
In fact "mvn dependency:list" doesn't show pdfbox 2.0.4 (although it is listed in https://mvnrepository.com/artifact/org.apache.xmlgraphics/fop/2.2). Maybe I'm missing something?. This is the output from mvn dependency:list :
[INFO] Scanning for projects...
[INFO]
[INFO] ------------------------------------------------------------------------
[INFO] Building tmp_xmp_issue 1.0-SNAPSHOT
[INFO] ------------------------------------------------------------------------
[INFO]
[INFO] --- maven-dependency-plugin:2.1:list (default-cli) @ tmp_xmp_issue ---
[INFO]
[INFO] The following files have been resolved:
[INFO] commons-io:commons-io:jar:1.3.1:compile
[INFO] commons-logging:commons-logging:jar:1.2:compile
[INFO] junit:junit:jar:3.8.1:test
[INFO] org.apache.avalon.framework:avalon-framework-api:jar:4.3.1:compile
[INFO] org.apache.avalon.framework:avalon-framework-impl:jar:4.3.1:compile
[INFO] org.apache.pdfbox:fontbox:jar:2.0.7:compile
[INFO] org.apache.pdfbox:pdfbox:jar:2.0.7:compile
[INFO] org.apache.pdfbox:pdfbox-debugger:jar:2.0.7:compile
[INFO] org.apache.pdfbox:pdfbox-tools:jar:2.0.7:compile
[INFO] org.apache.pdfbox:xmpbox:jar:2.0.7:compile
[INFO] org.apache.xmlgraphics:batik-anim:jar:1.9:compile
[INFO] org.apache.xmlgraphics:batik-awt-util:jar:1.9:compile
[INFO] org.apache.xmlgraphics:batik-bridge:jar:1.9:compile
[INFO] org.apache.xmlgraphics:batik-constants:jar:1.9:compile
[INFO] org.apache.xmlgraphics:batik-css:jar:1.9:compile
[INFO] org.apache.xmlgraphics:batik-dom:jar:1.9:compile
[INFO] org.apache.xmlgraphics:batik-ext:jar:1.9:compile
[INFO] org.apache.xmlgraphics:batik-extension:jar:1.9:compile
[INFO] org.apache.xmlgraphics:batik-gvt:jar:1.9:compile
[INFO] org.apache.xmlgraphics:batik-i18n:jar:1.9:compile
[INFO] org.apache.xmlgraphics:batik-parser:jar:1.9:compile
[INFO] org.apache.xmlgraphics:batik-script:jar:1.9:compile
[INFO] org.apache.xmlgraphics:batik-svg-dom:jar:1.9:compile
[INFO] org.apache.xmlgraphics:batik-svggen:jar:1.9:compile
[INFO] org.apache.xmlgraphics:batik-transcoder:jar:1.9:compile
[INFO] org.apache.xmlgraphics:batik-util:jar:1.9:compile
[INFO] org.apache.xmlgraphics:batik-xml:jar:1.9:compile
[INFO] org.apache.xmlgraphics:fop:jar:2.2:compile
[INFO] org.apache.xmlgraphics:xmlgraphics-commons:jar:2.2:compile
[INFO] xalan:serializer:jar:2.7.2:compile
[INFO] xalan:xalan:jar:2.7.2:compile
[INFO] xml-apis:xml-apis:jar:1.3.04:compile
[INFO] xml-apis:xml-apis-ext:jar:1.3.04:compile
[INFO]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 1.042s
[INFO] Finished at: Thu Jul 27 15:00:34 ART 2017
[INFO] Final Memory: 9M/155M
[INFO] ------------------------------------------------------------------------
Besides, it doesn't explain (at least for me) why just excluding the "services" from the META-INF solves the issue.
Esteban
________________________________
De: Andreas Lehmkuehler <an...@lehmi.de>
Enviado: jueves, 27 de julio de 2017 05:04 p.m.
Para: users@pdfbox.apache.org
Asunto: Re: XMP issue when using fop and shaded jar
fop uses PDFBox 2.0.4 so that there are 2 concurrent versions to be put into the
jar. And obviously the older one "wins"
Andreas
Re: XMP issue when using fop and shaded jar
Posted by Andreas Lehmkuehler <an...@lehmi.de>.
fop uses PDFBox 2.0.4 so that there are 2 concurrent versions to be put into the
jar. And obviously the older one "wins"
Andreas
Am 27.07.2017 um 17:49 schrieb Esteban R:
> I'm working in a project which uses both pdfbox and apache fop (https://xmlgraphics.apache.org/fop/)
>
>
> CreatePDFA (from pdfbox examples) generates different (wrong?) XMP metadata when fop is included in the shaded jar. Otherwise (i.e.: if a simple jar is generated or if fop is not included, then metadata is ok). The issue seems to be related to the services provided by fop (more details below). How can I solve it? Is it ok to exclude the services provided by fop from the shaded jar?
>
>
> You can find a sample project here:
>
> <http://www.filedropper.com/tmpxmpissue>http://www.filedropper.com/tmpxmpissue_1
>
>
> To reproduce you can use try.sh -included in the project- in a cygwin (Windows 10) environment -should work on linux- or you can:
>
> mvn clean install
>
> java -jar target/tmp_xmp_issue-1.0-SNAPSHOT.jar out.pdf "Hello world" OpenSans-Regular.ttf
> Open out.pdf with PDFDebugger and see the XMPMetadata:
> <rdf:li lang="x-default">out.pdf</rdf:li>
>
> If you don't include fop (i.e. use pom.xml.nofop) then you will get:
> <rdf:li xml:lang="x-default">out.pdf</rdf:li>^M
>
> The same will happen if you use pom.xml.noservices (which just excludes the services provided by fop).
>
>
> I use java version "1.8.0_112" and pdfbox 2.0.7.
>
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org
Re: XMP issue when using fop and shaded jar
Posted by Esteban R <er...@hotmail.com>.
Thanks Tilman!
I have used the modified XMPSerializer according to your advice and that solved the issue.
Esteban Ruiz
Re: XMP issue when using fop and shaded jar
Posted by Tilman Hausherr <TH...@t-online.de>.
Am 27.07.2017 um 17:49 schrieb Esteban R:
> I'm working in a project which uses both pdfbox and apache fop (https://xmlgraphics.apache.org/fop/)
>
>
> CreatePDFA (from pdfbox examples) generates different (wrong?) XMP metadata when fop is included in the shaded jar. Otherwise (i.e.: if a simple jar is generated or if fop is not included, then metadata is ok). The issue seems to be related to the services provided by fop (more details below). How can I solve it? Is it ok to exclude the services provided by fop from the shaded jar?
>
>
> You can find a sample project here:
>
> <http://www.filedropper.com/tmpxmpissue>http://www.filedropper.com/tmpxmpissue_1
>
>
> To reproduce you can use try.sh -included in the project- in a cygwin (Windows 10) environment -should work on linux- or you can:
>
> mvn clean install
>
> java -jar target/tmp_xmp_issue-1.0-SNAPSHOT.jar out.pdf "Hello world" OpenSans-Regular.ttf
> Open out.pdf with PDFDebugger and see the XMPMetadata:
> <rdf:li lang="x-default">out.pdf</rdf:li>
>
> If you don't include fop (i.e. use pom.xml.nofop) then you will get:
> <rdf:li xml:lang="x-default">out.pdf</rdf:li>^M
>
> The same will happen if you use pom.xml.noservices (which just excludes the services provided by fop).
>
>
> I use java version "1.8.0_112" and pdfbox 2.0.7.
>
In Serializer, the namespace is set with
esimple.setAttributeNS(attribute.getNamespace(), attribute.getName(),
attribute.getValue());
I traced it, the parameters are
http://www.w3.org/XML/1998/namespace
lang
x-default
so all is good there. It's the transformer who is the problem.
This code in Serializer.java:
Transformer transformer = TransformerFactory.newInstance().newTransformer();
returns a com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl
class. If fop is there, it returns a
org.apache.xalan.transformer.TransformerIdentityImpl class.
javadoc:
https://docs.oracle.com/javase/7/docs/api/javax/xml/transform/TransformerFactory.html#newInstance()
"The Services API will look for a classname in the file
META-INF/services/javax.xml.transform.TransformerFactory in jars
available to the runtime."
fop uses xalan-2.7.2.jar and that one has META-INF.services with
javax.xml.transform.TransformerFactory, and that one contains
org.apache.xalan.processor.TransformerFactoryImpl.
So the question is now, who is right? Is it a bug in xalan or in the
calling code, i.e. do we have to set some option?
We could force the default implementation by changing the code in "save" to
Transformer transformer =
TransformerFactory.newInstance("com.sun.org.apache.xalan.internal.xsltc.trax.TransformerFactoryImpl",
null).newTransformer();
Or you can set a system property:
System.setProperty("javax.xml.transform.TransformerFactory",
"com.sun.org.apache.xalan.internal.xsltc.trax.TransformerFactoryImpl");
However maybe this will mess up something in FOP :-(
So I guess the best for you would be to copy the source code of
Serializer, and to change the newInstance call as described. Make sure
this works for all java versions.
Tilman
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org