You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Andreas Lehmkühler (JIRA)" <ji...@apache.org> on 2010/09/02 21:23:53 UTC

[jira] Resolved: (PDFBOX-568) testextract failure on Linux and Mac OS X

     [ https://issues.apache.org/jira/browse/PDFBOX-568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andreas Lehmkühler resolved PDFBOX-568.
---------------------------------------

    Fix Version/s: 1.3.0
       Resolution: Fixed

Version 992066 fixes the text extraction issue with sample_fonts_solidconvertor.pdf and cweb.pdf from our test arena.

To achieve that I rearranged/improved the code concerning the encoding. The next step will hopefully be adding support for CID coded fonts

> testextract failure on Linux and Mac OS X
> -----------------------------------------
>
>                 Key: PDFBOX-568
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-568
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 0.8.0-incubator
>            Reporter: Jukka Zitting
>             Fix For: 1.3.0
>
>
> As discussed on the mailing list, the extraction test case seems to fail on non-Windows platforms.
> The troublesome test file is ample_fonts_solidconvertor.pdf, and the textextract.log file says the following (^@ is U+0000 and � is U+FFFD):
> Lines differ at index expected:46-253 actual:46-65533
> FAILURE: Line mismatch for file sample_fonts_solidconvertor.pdf at expected line: 8 at actual line: 8
>   expected line was: "^@V^@e^@r^@d^@a^@n^@a^@:^@ ^@T^@o^@t^@o^@ ^@j^@e^@ ^@p^@o^@k^@u^@s^@n^@ý^@ ^@t^@e^@x^@t^@ ^@s^@ ^A"
>   actual line was:   "^@V^@e^@r^@d^@a^@n^@a^@:^@ ^@T^@o^@t^@o^@ ^@j^@e^@ ^@p^@o^@k^@u^@s^@n^@�^@ ^@t^@e^@x^@t^@ ^@s^@ ^A"
> Lines differ at index expected:4-253 actual:4-65533
> FAILURE: Line mismatch for file sample_fonts_solidconvertor.pdf at expected line: 10 at actual line: 10
>   expected line was: "^AY^A~^@ý^@á^@í^@é"
>   actual line was:   "^AY^A~^@�^@�^@�^@�"
> Lines differ at index expected:52-253 actual:52-65533
> FAILURE: Line mismatch for file sample_fonts_solidconvertor.pdf at expected line: 11 at actual line: 11
>   expected line was: "^@S^@a^@n^@s^@ ^@s^@e^@r^@i^@f^@:^@ ^@T^@o^@t^@o^@ ^@j^@e^@ ^@p^@o^@k^@u^@s^@n^@ý^@ ^@t^@e^@x^@t^@ ^@s^@ ^A"
>   actual line was:   "^@S^@a^@n^@s^@ ^@s^@e^@r^@i^@f^@:^@ ^@T^@o^@t^@o^@ ^@j^@e^@ ^@p^@o^@k^@u^@s^@n^@�^@ ^@t^@e^@x^@t^@ ^@s^@ ^A"
> Lines differ at index expected:4-253 actual:4-65533
> FAILURE: Line mismatch for file sample_fonts_solidconvertor.pdf at expected line: 13 at actual line: 13
>   expected line was: "^AY^A~^@ý^@á^@í^@é"
>   actual line was:   "^AY^A~^@�^@�^@�^@�"
> Preparing to parse sample_fonts_solidconvertor.pdf for sorted test
> Lines differ at index expected:46-253 actual:46-65533
> FAILURE: Line mismatch for file sample_fonts_solidconvertor.pdf at expected line: 8 at actual line: 8
>   expected line was: "^@V^@e^@r^@d^@a^@n^@a^@:^@ ^@T^@o^@t^@o^@ ^@j^@e^@ ^@p^@o^@k^@u^@s^@n^@ý^@ ^@t^@e^@x^@t^@ ^@s^@ ^A"
>   actual line was:   "^@V^@e^@r^@d^@a^@n^@a^@:^@ ^@T^@o^@t^@o^@ ^@j^@e^@ ^@p^@o^@k^@u^@s^@n^@�^@ ^@t^@e^@x^@t^@ ^@s^@ ^A"
> Lines differ at index expected:0-253 actual:0-65533
> FAILURE: Line mismatch for file sample_fonts_solidconvertor.pdf at expected line: 10 at actual line: 10
>   expected line was: "^@ý^@á^@í^@é"
>   actual line was:   "^@�^@�^@�^@�"
> Lines differ at index expected:52-253 actual:52-65533
> FAILURE: Line mismatch for file sample_fonts_solidconvertor.pdf at expected line: 11 at actual line: 11
>   expected line was: "^@S^@a^@n^@s^@ ^@s^@e^@r^@i^@f^@:^@ ^@T^@o^@t^@o^@ ^@j^@e^@ ^@p^@o^@k^@u^@s^@n^@ý^@ ^@t^@e^@x^@t^@ ^@s^@ ^A"
>   actual line was:   "^@S^@a^@n^@s^@ ^@s^@e^@r^@i^@f^@:^@ ^@T^@o^@t^@o^@ ^@j^@e^@ ^@p^@o^@k^@u^@s^@n^@�^@ ^@t^@e^@x^@t^@ ^@s^@ ^A"
> Lines differ at index expected:4-253 actual:4-65533
> FAILURE: Line mismatch for file sample_fonts_solidconvertor.pdf at expected line: 13 at actual line: 13
>   expected line was: "^A~^AY^@ý^@á^@í^@é"
>   actual line was:   "^A~^AY^@�^@�^@�^@�"

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.