You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Timo Boehme (Commented) (JIRA)" <ji...@apache.org> on 2012/01/27 13:04:40 UTC

[jira] [Commented] (PDFBOX-1213) Adding style information to the PDF to HTML converter

    [ https://issues.apache.org/jira/browse/PDFBOX-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13194619#comment-13194619 ] 

Timo Boehme commented on PDFBOX-1213:
-------------------------------------

I cannot see why the DOCTYPE declaration is a problem. Maybe something is wrong with your SAX parser configuration, e.g. trying to read the DTD? At least it should be made configurable if doctype will be added.
In order for easier XML processing afterwards I would propose to change HTML doctype to XHTML.
                
> Adding style information to the PDF to HTML converter
> -----------------------------------------------------
>
>                 Key: PDFBOX-1213
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1213
>             Project: PDFBox
>          Issue Type: Improvement
>    Affects Versions: 1.6.0
>            Reporter: Enrique PĂ©rez
>         Attachments: diff.patch
>
>
> This patch modifies the PDF to HTML conversion in order to add style information (bold, italic and size font) in the resulting file. Moreover, we have deleted the "DOCTYPE" header because some parsers throws the following exception:
> [Fatal Error] loose.dtd:31:3: The declaration for the entity "HTML.Version" must end with '>'.
> org.xml.sax.SAXParseException: The declaration for the entity "HTML.Version" must end with '>'.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira