You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Andreas Lehmkühler (JIRA)" <ji...@apache.org> on 2011/02/20 13:00:38 UTC
[jira] Commented: (PDFBOX-964) Wrong charecters
[ https://issues.apache.org/jira/browse/PDFBOX-964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12997131#comment-12997131 ]
Andreas Lehmkühler commented on PDFBOX-964:
-------------------------------------------
The attached pdf works fine with the current trunk using the ExtractMetaData example [1]. The pdf is encrypted, did you decrypt it before extracting the metadata?
[1] http://svn.apache.org/repos/asf/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/examples/pdmodel/ExtractMetadata.java
> Wrong charecters
> ----------------
>
> Key: PDFBOX-964
> URL: https://issues.apache.org/jira/browse/PDFBOX-964
> Project: PDFBox
> Issue Type: Bug
> Components: PDModel
> Affects Versions: 1.1.0, 1.4.0, 1.5.0
> Environment: Linux
> Reporter: Bogdan Artyushenko
> Attachments: nsw-solar-feed-in-tariff-report-to-ministers.pdf
>
>
> I have a PDF document (format 1.5 PDF) and when I try to deal with it, PDFBox shows some junk characters.
> For example
> PDDocumentInformation info = doc.getDocumentInformation();
> System.out.println("Title=" + info.getTitle());
> System.out.println("Author=" + info.getAuthor());
> System.out.println("Subject=" + info.getSubject());
> System.out.println("Keywords=" + info.getKeywords());
> System.out.println("Creator=" + info.getCreator());
> System.out.println("Producer=" + info.getProducer());
> System.out.println("Creation Date=" + info.getCreationDate())
> Returns
> Title=o,¢bzbÜcqhg6cZêeG9øÀÈÕ߶¹àéXðA<\ÐhÔÑ®1
> Author=o,¢v`
> Subject=null
> Keywords=null
> Creator=Q÷P
b
> b
h6tzeyúLc^àb ®4íÓ
¸ì
> Producer=Q÷P
b
> b
h6tze<Z¸R"
> The same goes on when I try to parse the file (I need to find all links in it).
> For this I use:
> for (final Iterator jt = annotations.iterator(); jt.hasNext();) {
> final PDAnnotation annot = (PDAnnotation) jt.next();
> if (!annot.isInvisible()) {
> if (annot instanceof PDAnnotationLink) {
> final PDAnnotationLink link = (PDAnnotationLink) annot;
> final PDAction action = link.getAction();
> if (action instanceof PDActionURI) {
> final PDActionURI uri = (PDActionURI) action;
> And I got links of type "N<»¬_f`Èø²\½8Ø,ÑBä<ÊÇ{".
> But if I open it with Document Viewer, Adobe Reader or midnigt commander I don't see any problems there.
> I have tested it in 1.5, 1.4, 1.1 versions of PDFBox.
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira