You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Andreas Lehmkühler (JIRA)" <ji...@apache.org> on 2011/02/20 13:00:38 UTC

[jira] Commented: (PDFBOX-964) Wrong charecters

    [ https://issues.apache.org/jira/browse/PDFBOX-964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12997131#comment-12997131 ] 

Andreas Lehmkühler commented on PDFBOX-964:
-------------------------------------------

The attached pdf works fine with the current trunk using the ExtractMetaData example [1]. The pdf is encrypted, did you decrypt it before extracting the metadata?


[1] http://svn.apache.org/repos/asf/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/examples/pdmodel/ExtractMetadata.java

> Wrong charecters
> ----------------
>
>                 Key: PDFBOX-964
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-964
>             Project: PDFBox
>          Issue Type: Bug
>          Components: PDModel
>    Affects Versions: 1.1.0, 1.4.0, 1.5.0
>         Environment: Linux
>            Reporter: Bogdan Artyushenko
>         Attachments: nsw-solar-feed-in-tariff-report-to-ministers.pdf
>
>
> I have a PDF document (format 1.5 PDF) and when I try to deal with it, PDFBox shows some junk characters. 
> For example 
>             PDDocumentInformation info = doc.getDocumentInformation();
>             System.out.println("Title=" + info.getTitle());
>             System.out.println("Author=" + info.getAuthor());
>             System.out.println("Subject=" + info.getSubject());
>             System.out.println("Keywords=" + info.getKeywords());
>             System.out.println("Creator=" + info.getCreator());
>             System.out.println("Producer=" + info.getProducer());
>             System.out.println("Creation Date=" + info.getCreationDate())
> Returns
> Title=o,¢‘b‰zbÜcqhg­6cZêeGŸ9øÀÈÕ߶¹àéXð‡A<\Ðh„žÔ„Ñ®1
> Author=o,¢‘v‰`
> Subject=null
> Keywords=null
> Creator=Q“÷P…b
> b…h6tzeyúLc^àb        ®4íÓ˜…¸ì
> Producer=Q“÷P…b
> b…h6tze<Z¸R"
> The same goes on when I try to parse the file (I need to find all links in it).
> For this I use:
>             for (final Iterator jt = annotations.iterator(); jt.hasNext();) {
>                 final PDAnnotation annot = (PDAnnotation) jt.next();
>                 if (!annot.isInvisible()) {
>                     if (annot instanceof PDAnnotationLink) {
>                         final PDAnnotationLink link = (PDAnnotationLink) annot;
>                         final PDAction action = link.getAction();
>                         if (action instanceof PDActionURI) {
>                             final PDActionURI uri = (PDActionURI) action;
> And I got links of type "N<»¬_f`ȇœø²\½8Ø,ÑBä<ʓÇ{".
> But if I open it with Document Viewer, Adobe Reader or midnigt commander I don't see any problems there.
> I have tested it in 1.5, 1.4, 1.1 versions of PDFBox.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira