You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Tilman Hausherr (JIRA)" <ji...@apache.org> on 2019/01/10 20:17:00 UTC

[jira] [Comment Edited] (PDFBOX-4431) PDFBox recognizes only a few words

    [ https://issues.apache.org/jira/browse/PDFBOX-4431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16739746#comment-16739746 ] 

Tilman Hausherr edited comment on PDFBOX-4431 at 1/10/19 8:16 PM:
------------------------------------------------------------------

That is something on top of PDFBox. It isn't authored by the PDFBox team. You should ask the author of that tool, i.e. Leslie Lau. Or debug it yourself and then find out why it isn't working, e.g. by looking what's coming into writeString. I'd be glad to help if you can trace this to a bug in PDFBox but it isn't looking like that considering the text file I attached.


was (Author: tilman):
That is something on top of PDFBox. It isn't authored by the PDFBox team. You should ask the author of that tool, i.e. Leslie Lau. Or debug it yourself and then find out why it isn't working, e.g. by looking what's coming into writeString. I'd be glad to help if you can trace this to a bug in PDFBox but it isn't looking like that considering the text I attached.

> PDFBox recognizes only a few words
> ----------------------------------
>
>                 Key: PDFBOX-4431
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4431
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Documentation, Text extraction
>         Environment: OS: Windows 10.
> IDE: Oxygen.3a Release (4.7.3a)
> PDF version: Adobe Acrobat Pro DC - 2019.010.20069.49826
>            Reporter: Krutheeka Rajkumar
>            Priority: Major
>         Attachments: RS13170.pdf, RS13170.txt
>
>
> The code I have posted takes in 5 arguments which include the location to a pdf document and a search term. The code is to parse through the PDF document and return all the matches to the keyword in the document and return their locations depending on the format (last given argument).
> The code for some reason recognizes only a few words and errors on other words. I am not sure why this is.
> There seems to be no difference in these words in terms of font, size location etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org