You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Andreas Lehmkühler (JIRA)" <ji...@apache.org> on 2013/08/11 19:46:48 UTC
[jira] [Closed] (PDFBOX-132) PDFReader text shows as boxes

     [ https://issues.apache.org/jira/browse/PDFBOX-132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andreas Lehmkühler closed PDFBOX-132.
-------------------------------------

    Resolution: Cannot Reproduce
      Assignee: Andreas Lehmkühler

Closed as there isn't any sample pdf and I guess in the mean time the issue is most likely solved.
                
> PDFReader text shows as boxes
> -----------------------------
>
>                 Key: PDFBOX-132
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-132
>             Project: PDFBox
>          Issue Type: Bug
>          Components: PDFReader
>            Assignee: Andreas Lehmkühler
>
> [imported from SourceForge]
> http://sourceforge.net/tracker/index.php?group_id=78314&atid=552832&aid=1430273
> Originally submitted by benlitchfield on 2006-02-12 12:44.
> See newcent_soca_trad.pdf
> Ben
> [comment on SourceForge]
> Originally sent by govardhana.
> Logged In: YES 
> user_id=1452645
> Hi Ben,
> I was going through the Pdf file conversion by the acrobat 
> professional. One difference i observed between the 
> extraction of text by acrobat professional and PDFBox is 
> that while exracting the text from the pdf file, the 
> PDFBox extracts even the text from the image object if 
> there is some text present, but the acrobat doesn't do 
> that it eliminates the text which is present in the 
> images. So that makes the difference. When we start 
> extract the text from the images if there is some text 
> which is in a non-ascii format then that text is 
> represented as a box because it is extracted as it from 
> the image. The acrobat avoids this type of extraction and 
> does not give any box. 
> This is only my view if i am wrong thn please excuse me.
> I am waiting for your reply on this.
> Thanking you
> Regards,
> Govardhana
> [comment on SourceForge]
> Originally sent by govardhana.
> Logged In: YES 
> user_id=1452645
> Hi Ben,
> Even i am facing the same problem. Sometimes when i try to 
> extract the PDF file content the extracted text contains 
> boxes which doesn't give me any idea of they mean. When i 
> tried to extract the same pdf file using the acrobat 
> professional the whole text which contained boxes was 
> eliminated and the rest of the text content was extracted. 
> I wanted to know whether any remedy as been found or can u 
> help to do the same what the acrobat professional does. 
> Thank You
> Regards,
> Govardhan

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira