You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Jukka Zitting (JIRA)" <ji...@apache.org> on 2010/11/01 01:17:23 UTC

[jira] Commented: (TIKA-531) xmpTPg:NPages creates invalid XML

    [ https://issues.apache.org/jira/browse/TIKA-531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12926796#action_12926796 ] 

Jukka Zitting commented on TIKA-531:
------------------------------------

How is the output invalid XML? The name attribute in <meta name="xmpTPg:NPages" content="..."/> is defined as a plain CDATA attribute by XHTML, so a parser shouldn't try to parse it's contents as an XML name.

Note that down the line we may want to switch to something like RDFa for serializing metadata attributes, but for now the metadata names should be treated just as plain strings even though the xmp ones look like XML names with their prefixes.

> xmpTPg:NPages creates invalid XML
> ---------------------------------
>
>                 Key: TIKA-531
>                 URL: https://issues.apache.org/jira/browse/TIKA-531
>             Project: Tika
>          Issue Type: Bug
>          Components: metadata
>    Affects Versions: 0.8
>            Reporter: Sjoerd Smeets
>             Fix For: 0.8
>
>
> Hi,
> Parsing MS Office files or PDF documents results invalid XML as there is a missing name-space definition for xmpTPg:NPages. What would be the best approach, renaming this field or add the name-space definition to the header of the output xml?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.