You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@pdfbox.apache.org by "Brian Carrier (JIRA)" <ji...@apache.org> on 2009/02/04 20:02:04 UTC

[jira] Created: (PDFBOX-419) Provide info on number of characters in document that were mapped and decoded.

Provide info on number of characters in document that were mapped and decoded.
------------------------------------------------------------------------------

                 Key: PDFBOX-419
                 URL: https://issues.apache.org/jira/browse/PDFBOX-419
             Project: PDFBox
          Issue Type: New Feature
          Components: Text extraction
            Reporter: Brian Carrier
            Priority: Minor


For various reasons, some text cannot be extracted from PDF files. A "?" is saved in the text output for those cases, but this does not allow an automated system to determine how much of the document that  PDFBox was able to process. There should be a way for the caller to determine how much of the file PDFBox could process. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (PDFBOX-419) Provide info on number of characters in document that were mapped and decoded.

Posted by "Brian Carrier (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/PDFBOX-419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Brian Carrier resolved PDFBOX-419.
----------------------------------

    Resolution: Fixed
      Assignee: Brian Carrier

Fixed by keeping track of data and adding methods to access them.

Sending        PDFStreamEngine.java
Transmitting file data .
Committed revision 740843.

> Provide info on number of characters in document that were mapped and decoded.
> ------------------------------------------------------------------------------
>
>                 Key: PDFBOX-419
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-419
>             Project: PDFBox
>          Issue Type: New Feature
>          Components: Text extraction
>            Reporter: Brian Carrier
>            Assignee: Brian Carrier
>            Priority: Minor
>
> For various reasons, some text cannot be extracted from PDF files. A "?" is saved in the text output for those cases, but this does not allow an automated system to determine how much of the document that  PDFBox was able to process. There should be a way for the caller to determine how much of the file PDFBox could process. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.