You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Ernesto De Santis (JIRA)" <ji...@apache.org> on 2009/12/12 20:27:18 UTC

[jira] Commented: (PDFBOX-534) PDF file created with LaTeX is bad parsed

    [ https://issues.apache.org/jira/browse/PDFBOX-534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12789779#action_12789779 ] 

Ernesto De Santis commented on PDFBOX-534:
------------------------------------------

With the current code the error still there.

I get the code today (12/12/2009) from:
http://svn.apache.org/repos/asf/pdfbox/trunk

Output:
a73a109a112a108a101a109a101a110a116a97a110a100a111 a97a99a99a101a115a111 a97 a115a105a115a116a101a109a97a115 a100a101
a97a114a99a104a105a118a111a115 a118a105a114a116a117a97a108a101a115 a112a97a114a97 a108a97 a104a101a114a114a97a109a105a101a110a116a97
a100a101 a98a250a115a113a117a101a100a97 a75a110a101a111a98a97a115a101
a65a108a117a109a110a111a58 a69a114a110a101a115a116a111 a68a101 a83a97a110a116a105a115
a68a105a114a101a99a116a111a114a58 a80a97a98a108a111 a69a114a110a101a115a116a111 a77a97a114a116a237a110a101a122 a76a243a112a101a122
and more.....

> PDF file created with LaTeX is bad parsed
> -----------------------------------------
>
>                 Key: PDFBOX-534
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-534
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 0.8.0-incubator
>         Environment: Linux/Ubuntu 9
>            Reporter: Ernesto De Santis
>         Attachments: kvfs.pdf, kvfs.txt
>
>
> I'm getting an unexpected behavior parsing a pdf file.
> I'm trying to get the clean body text of some file, and I get a lot of aXX strings. Where each X is a number. It appear be the char code of the real character, I don't know really.
> My code is too simple:
>           String[] args = {"/home/ernesto/tesis/documento/kvfs.pdf"};
>           ExtractText.main(args);
> I used the PDFBox 0.8.0-incubator version. Builded on 20/9/2009. 
> The output I get is:
> a73a109a112a108a101a109a101a110a116a97a110a100a111 a97a99a99a101a115a111 a97 a115a105a115a116a101a109a97a115 a100a101
> a97a114a99a104a105a118a111a115 a118a105a114a116a117a97a108a101a115 a112a97a114a97 a108a97 a104a101a114a114a97a109a105a101a110a116a97
> a100a101 a98a250a115a113a117a101a100a97 a75a110a101a111a98a97a115a101
> and more ......
> The pdf file was generated by pdflatex command, in Ubuntu 9.
> The pdf properties are:
> producer: pdfTeX-1.40.3
> format: PDF-1.4
> security: NO
> optimized: NO
> paper: A4, vertical (210 x 297 mm)
> When I run the PDFBox test, I get this by the console:
> 0 [main] INFO org.apache.pdfbox.util.PDFStreamEngine  - unsupported/disabled operation: d
> INFO  [main]: unsupported/disabled operation: d
> 7 [main] INFO org.apache.pdfbox.util.PDFStreamEngine  - unsupported/disabled operation: J
> INFO  [main]: unsupported/disabled operation: J
> 7 [main] INFO org.apache.pdfbox.util.PDFStreamEngine  - unsupported/disabled operation: m
> INFO  [main]: unsupported/disabled operation: m
> 7 [main] INFO org.apache.pdfbox.util.PDFStreamEngine  - unsupported/disabled operation: l
> INFO  [main]: unsupported/disabled operation: l
> 7 [main] INFO org.apache.pdfbox.util.PDFStreamEngine  - unsupported/disabled operation: S
> INFO  [main]: unsupported/disabled operation: S
> 272 [main] INFO org.apache.pdfbox.util.PDFStreamEngine  - unsupported/disabled operation: re
> INFO  [main]: unsupported/disabled operation: re
> 272 [main] INFO org.apache.pdfbox.util.PDFStreamEngine  - unsupported/disabled operation: f
> INFO  [main]: unsupported/disabled operation: f
> 1274 [main] INFO org.apache.pdfbox.util.PDFStreamEngine  - unsupported/disabled operation: rg
> INFO  [main]: unsupported/disabled operation: rg
> 1275 [main] INFO org.apache.pdfbox.util.PDFStreamEngine  - unsupported/disabled operation: RG
> INFO  [main]: unsupported/disabled operation: RG
> 1536 [main] INFO org.apache.pdfbox.util.PDFStreamEngine  - unsupported/disabled operation: f*
> INFO  [main]: unsupported/disabled operation: f*

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.