You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Hesham (JIRA)" <ji...@apache.org> on 2011/01/06 08:49:46 UTC
[jira] Created: (PDFBOX-935) Text not extracted with PDFBox 1.4
Text not extracted with PDFBox 1.4
----------------------------------
Key: PDFBOX-935
URL: https://issues.apache.org/jira/browse/PDFBOX-935
Project: PDFBox
Issue Type: Bug
Components: Text extraction
Affects Versions: 1.4.0
Reporter: Hesham
Fix For: 1.2.1
Attachments: data_not_extracted.pdf
I have used PDFBox v1.2.1 to extract text from a PDF file, and it works perfect. But now I have tested it with PDFBox v1.4 and most of the text is not extracted.
I have attached a 1-page PDF file to test.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (PDFBOX-935) Text not extracted with PDFBox 1.4
Posted by "Andreas Lehmkühler (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PDFBOX-935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andreas Lehmkühler updated PDFBOX-935:
--------------------------------------
Fix Version/s: (was: 1.2.1)
> Text not extracted with PDFBox 1.4
> ----------------------------------
>
> Key: PDFBOX-935
> URL: https://issues.apache.org/jira/browse/PDFBOX-935
> Project: PDFBox
> Issue Type: Bug
> Components: Text extraction
> Affects Versions: 1.4.0
> Reporter: Hesham
> Attachments: data_not_extracted.pdf
>
>
> I have used PDFBox v1.2.1 to extract text from a PDF file, and it works perfect. But now I have tested it with PDFBox v1.4 and most of the text is not extracted.
> I have attached a 1-page PDF file to test.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Resolved: (PDFBOX-935) Text not extracted with PDFBox 1.4
Posted by "Andreas Lehmkühler (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PDFBOX-935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andreas Lehmkühler resolved PDFBOX-935.
---------------------------------------
Resolution: Fixed
Fix Version/s: 1.5.0
I improved ome of the font stuff with revision 1056721. In this special case the given encoding was overwritten by an empty encoding so that the extraction had to fail. It's fixed now and the extraction works as expected.
> Text not extracted with PDFBox 1.4
> ----------------------------------
>
> Key: PDFBOX-935
> URL: https://issues.apache.org/jira/browse/PDFBOX-935
> Project: PDFBox
> Issue Type: Bug
> Components: Text extraction
> Affects Versions: 1.4.0
> Reporter: Hesham
> Fix For: 1.5.0
>
> Attachments: data_not_extracted.pdf
>
>
> I have used PDFBox v1.2.1 to extract text from a PDF file, and it works perfect. But now I have tested it with PDFBox v1.4 and most of the text is not extracted.
> I have attached a 1-page PDF file to test.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PDFBOX-935) Text not extracted with PDFBox 1.4
Posted by "Andreas Lehmkühler (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PDFBOX-935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andreas Lehmkühler reassigned PDFBOX-935:
-----------------------------------------
Assignee: Andreas Lehmkühler
> Text not extracted with PDFBox 1.4
> ----------------------------------
>
> Key: PDFBOX-935
> URL: https://issues.apache.org/jira/browse/PDFBOX-935
> Project: PDFBox
> Issue Type: Bug
> Components: Text extraction
> Affects Versions: 1.4.0
> Reporter: Hesham
> Assignee: Andreas Lehmkühler
> Fix For: 1.5.0
>
> Attachments: data_not_extracted.pdf
>
>
> I have used PDFBox v1.2.1 to extract text from a PDF file, and it works perfect. But now I have tested it with PDFBox v1.4 and most of the text is not extracted.
> I have attached a 1-page PDF file to test.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (PDFBOX-935) Text not extracted with PDFBox 1.4
Posted by "Hesham (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PDFBOX-935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hesham updated PDFBOX-935:
--------------------------
Attachment: data_not_extracted.pdf
> Text not extracted with PDFBox 1.4
> ----------------------------------
>
> Key: PDFBOX-935
> URL: https://issues.apache.org/jira/browse/PDFBOX-935
> Project: PDFBox
> Issue Type: Bug
> Components: Text extraction
> Affects Versions: 1.4.0
> Reporter: Hesham
> Fix For: 1.2.1
>
> Attachments: data_not_extracted.pdf
>
>
> I have used PDFBox v1.2.1 to extract text from a PDF file, and it works perfect. But now I have tested it with PDFBox v1.4 and most of the text is not extracted.
> I have attached a 1-page PDF file to test.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.