You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Andreas Lehmkühler (JIRA)" <ji...@apache.org> on 2013/08/11 19:46:48 UTC
[jira] [Closed] (PDFBOX-132) PDFReader text shows as boxes
[ https://issues.apache.org/jira/browse/PDFBOX-132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andreas Lehmkühler closed PDFBOX-132.
-------------------------------------
Resolution: Cannot Reproduce
Assignee: Andreas Lehmkühler
Closed as there isn't any sample pdf and I guess in the mean time the issue is most likely solved.
> PDFReader text shows as boxes
> -----------------------------
>
> Key: PDFBOX-132
> URL: https://issues.apache.org/jira/browse/PDFBOX-132
> Project: PDFBox
> Issue Type: Bug
> Components: PDFReader
> Assignee: Andreas Lehmkühler
>
> [imported from SourceForge]
> http://sourceforge.net/tracker/index.php?group_id=78314&atid=552832&aid=1430273
> Originally submitted by benlitchfield on 2006-02-12 12:44.
> See newcent_soca_trad.pdf
> Ben
> [comment on SourceForge]
> Originally sent by govardhana.
> Logged In: YES
> user_id=1452645
> Hi Ben,
> I was going through the Pdf file conversion by the acrobat
> professional. One difference i observed between the
> extraction of text by acrobat professional and PDFBox is
> that while exracting the text from the pdf file, the
> PDFBox extracts even the text from the image object if
> there is some text present, but the acrobat doesn't do
> that it eliminates the text which is present in the
> images. So that makes the difference. When we start
> extract the text from the images if there is some text
> which is in a non-ascii format then that text is
> represented as a box because it is extracted as it from
> the image. The acrobat avoids this type of extraction and
> does not give any box.
> This is only my view if i am wrong thn please excuse me.
> I am waiting for your reply on this.
> Thanking you
> Regards,
> Govardhana
> [comment on SourceForge]
> Originally sent by govardhana.
> Logged In: YES
> user_id=1452645
> Hi Ben,
> Even i am facing the same problem. Sometimes when i try to
> extract the PDF file content the extracted text contains
> boxes which doesn't give me any idea of they mean. When i
> tried to extract the same pdf file using the acrobat
> professional the whole text which contained boxes was
> eliminated and the rest of the text content was extracted.
> I wanted to know whether any remedy as been found or can u
> help to do the same what the acrobat professional does.
> Thank You
> Regards,
> Govardhan
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira