You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Martijn Brinkers (JIRA)" <ji...@apache.org> on 2010/12/01 18:21:13 UTC

[jira] Closed: (PDFBOX-855) Extracted Text of MS Word generated PDFs corrupt

     [ https://issues.apache.org/jira/browse/PDFBOX-855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Martijn Brinkers closed PDFBOX-855.
-----------------------------------

    Resolution: Incomplete

The bug reporter didn't provide an example of a PDF document and/or didn't provide more proof.

> Extracted Text of MS Word generated PDFs corrupt
> ------------------------------------------------
>
>                 Key: PDFBOX-855
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-855
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 1.3.1
>         Environment: All
>            Reporter: Hendrik Lescak
>
> Since Revision 1003195 (PDFBOX-828: fixed some issues with positioning when extracting or rendering text) the text extraction with PDFTextStripper behaves differently for PDF documents generated with the MS Office Word 2007 "Save as PDF" Feature. 
> For example: The Term "Fachbereichsleiter" changed to "F a c hb e re ic hsle ite r" after PDFBOX-828.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.