You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Joscha Feth (JIRA)" <ji...@apache.org> on 2011/06/24 08:14:47 UTC
[jira] [Created] (PDFBOX-1048) Extracted PDF (text) partially
garbled
Extracted PDF (text) partially garbled
--------------------------------------
Key: PDFBOX-1048
URL: https://issues.apache.org/jira/browse/PDFBOX-1048
Project: PDFBox
Issue Type: Bug
Environment: OSX 10.6
Reporter: Joscha Feth
When using Tika 0.9 to etxract text from the given PDF, the text partially gets garbled.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PDFBOX-1048) Extracted PDF (text) partially
garbled
Posted by "Joscha Feth (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PDFBOX-1048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13054255#comment-13054255 ]
Joscha Feth commented on PDFBOX-1048:
-------------------------------------
same with
java -jar pdfbox-app-1.5.0.jar ExtractText agb_de.pdf output.nfo
> Extracted PDF (text) partially garbled
> --------------------------------------
>
> Key: PDFBOX-1048
> URL: https://issues.apache.org/jira/browse/PDFBOX-1048
> Project: PDFBox
> Issue Type: Bug
> Environment: OSX 10.6
> Reporter: Joscha Feth
> Attachments: PDFTest.java, agb_de.pdf, output.nfo
>
>
> When using Tika 0.9 to etxract text from the given PDF, the text partially gets garbled.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PDFBOX-1048) Extracted PDF (text) partially
garbled
Posted by "Joscha Feth (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PDFBOX-1048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Joscha Feth updated PDFBOX-1048:
--------------------------------
Attachment: Auftragsbestätigung.pdf
another one
> Extracted PDF (text) partially garbled
> --------------------------------------
>
> Key: PDFBOX-1048
> URL: https://issues.apache.org/jira/browse/PDFBOX-1048
> Project: PDFBox
> Issue Type: Bug
> Affects Versions: 1.5.0
> Environment: OSX 10.6
> Reporter: Joscha Feth
> Attachments: Auftragsbestätigung.pdf, PDFTest.java, agb_de.pdf, output.nfo
>
>
> When using Tika 0.9 to etxract text from the given PDF, the text partially gets garbled.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PDFBOX-1048) Extracted PDF (text) partially
garbled
Posted by "Joscha Feth (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PDFBOX-1048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Joscha Feth updated PDFBOX-1048:
--------------------------------
Affects Version/s: 1.5.0
> Extracted PDF (text) partially garbled
> --------------------------------------
>
> Key: PDFBOX-1048
> URL: https://issues.apache.org/jira/browse/PDFBOX-1048
> Project: PDFBox
> Issue Type: Bug
> Affects Versions: 1.5.0
> Environment: OSX 10.6
> Reporter: Joscha Feth
> Attachments: PDFTest.java, agb_de.pdf, output.nfo
>
>
> When using Tika 0.9 to etxract text from the given PDF, the text partially gets garbled.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PDFBOX-1048) Extracted PDF (text) partially
garbled
Posted by "Joscha Feth (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PDFBOX-1048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Joscha Feth updated PDFBOX-1048:
--------------------------------
Attachment: output.nfo
PDFTest.java
agb_de.pdf
agb_de.pdf is the respective PDF
PDFTest.java prints out the converted PDF
output.nfo contains the garbled textual contents of the PDF
> Extracted PDF (text) partially garbled
> --------------------------------------
>
> Key: PDFBOX-1048
> URL: https://issues.apache.org/jira/browse/PDFBOX-1048
> Project: PDFBox
> Issue Type: Bug
> Environment: OSX 10.6
> Reporter: Joscha Feth
> Attachments: PDFTest.java, agb_de.pdf, output.nfo
>
>
> When using Tika 0.9 to etxract text from the given PDF, the text partially gets garbled.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira