You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Laurent Richard (JIRA)" <ji...@apache.org> on 2014/10/27 14:16:34 UTC

[jira] [Comment Edited] (PDFBOX-2419) XFDF export is not XML compliant

    [ https://issues.apache.org/jira/browse/PDFBOX-2419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14185117#comment-14185117 ] 

Laurent Richard edited comment on PDFBOX-2419 at 10/27/14 1:15 PM:
-------------------------------------------------------------------

The problem is specific to XML format XFDF where special characters should be escaped. There's no issue with FDF.
I join a sample PDF file with simple AcroForm containing such characters.
With code like
{code}
PDDocument pdf = PDDocument.load("SampleForm.pdf");
PDAcroForm form = pdf.getDocumentCatalog().getAcroForm();
FDFDocument fdf = form.exportFDF();
List<FDFField> fields = fdf.getCatalog().getFDF().getFields();
StringWriter writer = new StringWriter();
fdf.saveXFDF(writer);
return writer.toString();
{code}
We get the following content
{code}
<?xml version="1.0" encoding="UTF-8"?>
<xfdf xmlns="http://ns.adobe.com/xfdf/" xml:space="preserve">
<ids original="40DE256FBEC20B428C72BCF68015AB9E" modified="3E3C2606FFB360C3FA74D3921A630318" />
<fields>
<field name="choix_1_J-64ncBSov7NcPeTY8oJ3A">
<value>Oui</value>
</field>
<field name="prenom_By11Gk3puTlnwwnv4WA0-g">
<value>special XML characters &lt; &gt; &amp;</value>
</field>
<field name="nom_yQacEuz649N*BJguviO5Ow">
<value>special XML characters < > &</value>
</field>
</fields>
</xfdf>
{code}
which is not valid XML since '<', '>' and '&' should be escaped. The right result would be :
{code}
<?xml version="1.0" encoding="UTF-8"?>
<xfdf xmlns="http://ns.adobe.com/xfdf/" xml:space="preserve">
<ids original="40DE256FBEC20B428C72BCF68015AB9E" modified="3E3C2606FFB360C3FA74D3921A630318" />
<fields>
<field name="choix_1_J-64ncBSov7NcPeTY8oJ3A">
<value>Oui</value>
</field>
<field name="prenom_By11Gk3puTlnwwnv4WA0-g">
<value>special XML characters &amp;lt; &amp;gt; &amp;amp;</value>
</field>
<field name="nom_yQacEuz649N*BJguviO5Ow">
<value>special XML characters &lt; &gt; &amp;</value>
</field>
</fields>
</xfdf>
{code}
Ideally, relying on JAXP (Java API for XML Processing) instead of manipulating directly a String content would handle such things.


was (Author: lrichard):
The problem is specific to XML format XFDF where special characters should be escaped. There's no issue with FDF.
I join a sample PDF file with simple AcroForm containing such characters (in the field named "Nom").
With code like
{code}
PDDocument pdf = PDDocument.load("SampleForm.pdf");
PDAcroForm form = pdf.getDocumentCatalog().getAcroForm();
FDFDocument fdf = form.exportFDF();
List<FDFField> fields = fdf.getCatalog().getFDF().getFields();
StringWriter writer = new StringWriter();
fdf.saveXFDF(writer);
return writer.toString();
{code}
We get the following content
{code}
<?xml version="1.0" encoding="UTF-8"?>
<xfdf xmlns="http://ns.adobe.com/xfdf/" xml:space="preserve">
<ids original="40DE256FBEC20B428C72BCF68015AB9E" modified="3E3C2606FFB360C3FA74D3921A630318" />
<fields>
<field name="choix_1_J-64ncBSov7NcPeTY8oJ3A">
<value>Oui</value>
</field>
<field name="prenom_By11Gk3puTlnwwnv4WA0-g">
<value>special XML characters &lt; &gt; &amp;</value>
</field>
<field name="nom_yQacEuz649N*BJguviO5Ow">
<value>special XML characters < > &</value>
</field>
</fields>
</xfdf>
{code}
which is not valid XML since '<', '>' and '&' should be escaped. The right result would be :
{code}
<?xml version="1.0" encoding="UTF-8"?>
<xfdf xmlns="http://ns.adobe.com/xfdf/" xml:space="preserve">
<ids original="40DE256FBEC20B428C72BCF68015AB9E" modified="3E3C2606FFB360C3FA74D3921A630318" />
<fields>
<field name="choix_1_J-64ncBSov7NcPeTY8oJ3A">
<value>Oui</value>
</field>
<field name="prenom_By11Gk3puTlnwwnv4WA0-g">
<value>special XML characters &amp;lt; &amp;gt; &amp;amp;</value>
</field>
<field name="nom_yQacEuz649N*BJguviO5Ow">
<value>special XML characters &lt; &gt; &amp;</value>
</field>
</fields>
</xfdf>
{code}
Ideally, relying on JAXP (Java API for XML Processing) instead of manipulating directly a String content would handle such things.

> XFDF export is not XML compliant
> --------------------------------
>
>                 Key: PDFBOX-2419
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-2419
>             Project: PDFBox
>          Issue Type: Bug
>          Components: AcroForm
>    Affects Versions: 1.8.7
>            Reporter: Laurent Richard
>              Labels: FDF
>             Fix For: 1.8.8
>
>         Attachments: SampleForm.pdf
>
>
> The XFDF content is written as a simple string instead of XML nodes.
> As a result, field values containing special characters (&, <, >, ...) are not escaped and the resulting XML is invalid.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)