You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Joscha Feth (JIRA)" <ji...@apache.org> on 2011/06/24 08:14:47 UTC

[jira] [Created] (PDFBOX-1048) Extracted PDF (text) partially garbled

Extracted PDF (text) partially garbled
--------------------------------------

                 Key: PDFBOX-1048
                 URL: https://issues.apache.org/jira/browse/PDFBOX-1048
             Project: PDFBox
          Issue Type: Bug
         Environment: OSX 10.6
            Reporter: Joscha Feth


When using Tika 0.9 to etxract text from the given PDF, the text partially gets garbled.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PDFBOX-1048) Extracted PDF (text) partially garbled

Posted by "Joscha Feth (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PDFBOX-1048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13054255#comment-13054255 ] 

Joscha Feth commented on PDFBOX-1048:
-------------------------------------

same with

java -jar pdfbox-app-1.5.0.jar ExtractText agb_de.pdf output.nfo



> Extracted PDF (text) partially garbled
> --------------------------------------
>
>                 Key: PDFBOX-1048
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1048
>             Project: PDFBox
>          Issue Type: Bug
>         Environment: OSX 10.6
>            Reporter: Joscha Feth
>         Attachments: PDFTest.java, agb_de.pdf, output.nfo
>
>
> When using Tika 0.9 to etxract text from the given PDF, the text partially gets garbled.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (PDFBOX-1048) Extracted PDF (text) partially garbled

Posted by "Joscha Feth (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PDFBOX-1048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Joscha Feth updated PDFBOX-1048:
--------------------------------

    Attachment: Auftragsbestätigung.pdf

another one

> Extracted PDF (text) partially garbled
> --------------------------------------
>
>                 Key: PDFBOX-1048
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1048
>             Project: PDFBox
>          Issue Type: Bug
>    Affects Versions: 1.5.0
>         Environment: OSX 10.6
>            Reporter: Joscha Feth
>         Attachments: Auftragsbestätigung.pdf, PDFTest.java, agb_de.pdf, output.nfo
>
>
> When using Tika 0.9 to etxract text from the given PDF, the text partially gets garbled.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Updated] (PDFBOX-1048) Extracted PDF (text) partially garbled

Posted by "Joscha Feth (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PDFBOX-1048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Joscha Feth updated PDFBOX-1048:
--------------------------------

    Affects Version/s: 1.5.0

> Extracted PDF (text) partially garbled
> --------------------------------------
>
>                 Key: PDFBOX-1048
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1048
>             Project: PDFBox
>          Issue Type: Bug
>    Affects Versions: 1.5.0
>         Environment: OSX 10.6
>            Reporter: Joscha Feth
>         Attachments: PDFTest.java, agb_de.pdf, output.nfo
>
>
> When using Tika 0.9 to etxract text from the given PDF, the text partially gets garbled.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (PDFBOX-1048) Extracted PDF (text) partially garbled

Posted by "Joscha Feth (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PDFBOX-1048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Joscha Feth updated PDFBOX-1048:
--------------------------------

    Attachment: output.nfo
                PDFTest.java
                agb_de.pdf

agb_de.pdf is the respective PDF
PDFTest.java prints out the converted PDF
output.nfo contains the garbled textual contents of the PDF

> Extracted PDF (text) partially garbled
> --------------------------------------
>
>                 Key: PDFBOX-1048
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1048
>             Project: PDFBox
>          Issue Type: Bug
>         Environment: OSX 10.6
>            Reporter: Joscha Feth
>         Attachments: PDFTest.java, agb_de.pdf, output.nfo
>
>
> When using Tika 0.9 to etxract text from the given PDF, the text partially gets garbled.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira