You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Andreas Lehmkühler (JIRA)" <ji...@apache.org> on 2010/12/14 20:55:05 UTC

[jira] Updated: (PDFBOX-759) Special characters not extracted

     [ https://issues.apache.org/jira/browse/PDFBOX-759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andreas Lehmkühler updated PDFBOX-759:
--------------------------------------

    Attachment: PDFBOX759-Mathematik_Stochastik.txt

> Special characters not extracted
> --------------------------------
>
>                 Key: PDFBOX-759
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-759
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 1.1.0, 1.2.0
>         Environment: all
>            Reporter: Sebastian Freuck
>         Attachments: Mathematik_Stochastik.pdf, PDFBOX759-Mathematik_Stochastik.txt
>
>
> When trying to extract characters for mathematic formulas, there appear to be lots of characters that don't seem to have any meaning.
> Take the example on page 80 the last formula with the binomial coefficient. The first opening bracket, when extracted using the Foxit Reader or Adobe Reader gets a character with the int value 18 and the closing bracket is the int value 19. Now when I look at the TextPosition objects using PDFBox, there is one character to the left of the 5 and that one has the glyph name spacehackarabic/space and the int value 32. 
> The next problem is that there seems to be a character at the same position as the 5, a 'controlLF'. What does it do at the same position as that number? 
> Mpw after the character 2 are 3 other characters, another 'controlLF' and two 'spacehackarabic/space'. There is no indication whatsoever abouth the bracket. What do those extra characters mean? And why doesn't it show the character for the bracket that I am able to extract using the other PDF readers?
> The PDF can be downloaded from http://upload.wikimedia.org/wikibooks/de/f/f6/Mathematik_Stochastik.pdf

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.