You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Christian Czech (JIRA)" <ji...@apache.org> on 2012/07/23 17:57:34 UTC

[jira] [Created] (PDFBOX-1362) Slovakian characters

Christian Czech created PDFBOX-1362:
---------------------------------------

             Summary: Slovakian characters
                 Key: PDFBOX-1362
                 URL: https://issues.apache.org/jira/browse/PDFBOX-1362
             Project: PDFBox
          Issue Type: Bug
          Components: Text extraction
    Affects Versions: 1.7.0
         Environment: Windows XP, Java 1.6.0_33
            Reporter: Christian Czech


Hello,

I have a PDF document with Slovakian characters:

Hlavní administrátor

My code:

PDDocument document = null;
document = PDDocument.load(pdfFile, true); PDFTextStripper stripper = 
null; stripper = new PDFTextStripper("ISO-8859-2"); 
stripper.getText(document);

I always get this result: Hlavn\? administr\ ?tor 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Updated] (PDFBOX-1362) Slovakian characters

Posted by "Christian Czech (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PDFBOX-1362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Christian Czech updated PDFBOX-1362:
------------------------------------

    Attachment: test_7_2_test.pdf
    
> Slovakian characters
> --------------------
>
>                 Key: PDFBOX-1362
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1362
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 1.7.0
>         Environment: Windows XP, Java 1.6.0_33
>            Reporter: Christian Czech
>         Attachments: test_7_2_test.pdf
>
>
> Hello,
> I have a PDF document with Slovakian characters:
> Hlavní administrátor
> My code:
> PDDocument document = null;
> document = PDDocument.load(pdfFile, true); PDFTextStripper stripper = 
> null; stripper = new PDFTextStripper("ISO-8859-2"); 
> stripper.getText(document);
> I always get this result: Hlavn\? administr\ ?tor 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (PDFBOX-1362) Slovakian characters

Posted by "Andreas Lehmkühler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PDFBOX-1362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13490524#comment-13490524 ] 

Andreas Lehmkühler commented on PDFBOX-1362:
--------------------------------------------

The most recent version is 1.7.1. There isn't any plan for a next release yet.

Please don't hijack JIRAs for such questions. Use our mailinglists instead [1]

[1] http://pdfbox.apache.org/mail-lists.html
                
> Slovakian characters
> --------------------
>
>                 Key: PDFBOX-1362
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1362
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 1.7.0
>         Environment: Windows XP, Java 1.6.0_33
>            Reporter: Christian Czech
>            Assignee: Andreas Lehmkühler
>             Fix For: 1.8.0
>
>         Attachments: PDFBOX-1362.patch, test_7_2_test.pdf
>
>
> Hello,
> I have a PDF document with Slovakian characters:
> Hlavní administrátor
> My code:
> PDDocument document = null;
> document = PDDocument.load(pdfFile, true); PDFTextStripper stripper = 
> null; stripper = new PDFTextStripper("ISO-8859-2"); 
> stripper.getText(document);
> I always get this result: Hlavn\? administr\ ?tor 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PDFBOX-1362) Slovakian characters

Posted by "Joe Lee (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PDFBOX-1362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13489024#comment-13489024 ] 

Joe Lee commented on PDFBOX-1362:
---------------------------------

Andreas,

Could you show me the URL that I can download the latest PDFbox jar file? Version 2.0.0 or 1.8.0 at least. Thanks.

Joe




                
> Slovakian characters
> --------------------
>
>                 Key: PDFBOX-1362
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1362
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 1.7.0
>         Environment: Windows XP, Java 1.6.0_33
>            Reporter: Christian Czech
>            Assignee: Andreas Lehmkühler
>             Fix For: 1.8.0
>
>         Attachments: PDFBOX-1362.patch, test_7_2_test.pdf
>
>
> Hello,
> I have a PDF document with Slovakian characters:
> Hlavní administrátor
> My code:
> PDDocument document = null;
> document = PDDocument.load(pdfFile, true); PDFTextStripper stripper = 
> null; stripper = new PDFTextStripper("ISO-8859-2"); 
> stripper.getText(document);
> I always get this result: Hlavn\? administr\ ?tor 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (PDFBOX-1362) Slovakian characters

Posted by "Andreas Lehmkühler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PDFBOX-1362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andreas Lehmkühler resolved PDFBOX-1362.
----------------------------------------

       Resolution: Fixed
    Fix Version/s: 1.8.0
         Assignee: Andreas Lehmkühler

I applied the fix in revision 1404698 as proposed.

Thanks for the contribution!

                
> Slovakian characters
> --------------------
>
>                 Key: PDFBOX-1362
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1362
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 1.7.0
>         Environment: Windows XP, Java 1.6.0_33
>            Reporter: Christian Czech
>            Assignee: Andreas Lehmkühler
>             Fix For: 1.8.0
>
>         Attachments: PDFBOX-1362.patch, test_7_2_test.pdf
>
>
> Hello,
> I have a PDF document with Slovakian characters:
> Hlavní administrátor
> My code:
> PDDocument document = null;
> document = PDDocument.load(pdfFile, true); PDFTextStripper stripper = 
> null; stripper = new PDFTextStripper("ISO-8859-2"); 
> stripper.getText(document);
> I always get this result: Hlavn\? administr\ ?tor 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira