You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@pdfbox.apache.org by "Hesham (JIRA)" <ji...@apache.org> on 2011/01/06 08:49:46 UTC

[jira] Created: (PDFBOX-935) Text not extracted with PDFBox 1.4

Text not extracted with PDFBox 1.4
----------------------------------

                 Key: PDFBOX-935
                 URL: https://issues.apache.org/jira/browse/PDFBOX-935
             Project: PDFBox
          Issue Type: Bug
          Components: Text extraction
    Affects Versions: 1.4.0
            Reporter: Hesham
             Fix For: 1.2.1
         Attachments: data_not_extracted.pdf

I have used PDFBox v1.2.1 to extract text from a PDF file, and it works perfect. But now I have tested it with PDFBox v1.4 and most of the text is not extracted.
I have attached a 1-page PDF file to test.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PDFBOX-935) Text not extracted with PDFBox 1.4

Posted by "Andreas Lehmkühler (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/PDFBOX-935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andreas Lehmkühler updated PDFBOX-935:
--------------------------------------

    Fix Version/s:     (was: 1.2.1)

> Text not extracted with PDFBox 1.4
> ----------------------------------
>
>                 Key: PDFBOX-935
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-935
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 1.4.0
>            Reporter: Hesham
>         Attachments: data_not_extracted.pdf
>
>
> I have used PDFBox v1.2.1 to extract text from a PDF file, and it works perfect. But now I have tested it with PDFBox v1.4 and most of the text is not extracted.
> I have attached a 1-page PDF file to test.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (PDFBOX-935) Text not extracted with PDFBox 1.4

Posted by "Andreas Lehmkühler (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/PDFBOX-935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andreas Lehmkühler resolved PDFBOX-935.
---------------------------------------

       Resolution: Fixed
    Fix Version/s: 1.5.0

I improved ome of the font stuff with revision 1056721. In this special case the given encoding was overwritten by an empty encoding so that the extraction had to fail. It's fixed now and the extraction works as expected.

> Text not extracted with PDFBox 1.4
> ----------------------------------
>
>                 Key: PDFBOX-935
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-935
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 1.4.0
>            Reporter: Hesham
>             Fix For: 1.5.0
>
>         Attachments: data_not_extracted.pdf
>
>
> I have used PDFBox v1.2.1 to extract text from a PDF file, and it works perfect. But now I have tested it with PDFBox v1.4 and most of the text is not extracted.
> I have attached a 1-page PDF file to test.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (PDFBOX-935) Text not extracted with PDFBox 1.4

Posted by "Andreas Lehmkühler (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/PDFBOX-935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andreas Lehmkühler reassigned PDFBOX-935:
-----------------------------------------

    Assignee: Andreas Lehmkühler

> Text not extracted with PDFBox 1.4
> ----------------------------------
>
>                 Key: PDFBOX-935
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-935
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 1.4.0
>            Reporter: Hesham
>            Assignee: Andreas Lehmkühler
>             Fix For: 1.5.0
>
>         Attachments: data_not_extracted.pdf
>
>
> I have used PDFBox v1.2.1 to extract text from a PDF file, and it works perfect. But now I have tested it with PDFBox v1.4 and most of the text is not extracted.
> I have attached a 1-page PDF file to test.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PDFBOX-935) Text not extracted with PDFBox 1.4

Posted by "Hesham (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/PDFBOX-935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hesham updated PDFBOX-935:
--------------------------

    Attachment: data_not_extracted.pdf

> Text not extracted with PDFBox 1.4
> ----------------------------------
>
>                 Key: PDFBOX-935
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-935
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 1.4.0
>            Reporter: Hesham
>             Fix For: 1.2.1
>
>         Attachments: data_not_extracted.pdf
>
>
> I have used PDFBox v1.2.1 to extract text from a PDF file, and it works perfect. But now I have tested it with PDFBox v1.4 and most of the text is not extracted.
> I have attached a 1-page PDF file to test.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.