You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Andreas Lehmkühler (JIRA)" <ji...@apache.org> on 2010/09/02 21:23:53 UTC
[jira] Resolved: (PDFBOX-568) testextract failure on Linux and Mac
OS X
[ https://issues.apache.org/jira/browse/PDFBOX-568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andreas Lehmkühler resolved PDFBOX-568.
---------------------------------------
Fix Version/s: 1.3.0
Resolution: Fixed
Version 992066 fixes the text extraction issue with sample_fonts_solidconvertor.pdf and cweb.pdf from our test arena.
To achieve that I rearranged/improved the code concerning the encoding. The next step will hopefully be adding support for CID coded fonts
> testextract failure on Linux and Mac OS X
> -----------------------------------------
>
> Key: PDFBOX-568
> URL: https://issues.apache.org/jira/browse/PDFBOX-568
> Project: PDFBox
> Issue Type: Bug
> Components: Text extraction
> Affects Versions: 0.8.0-incubator
> Reporter: Jukka Zitting
> Fix For: 1.3.0
>
>
> As discussed on the mailing list, the extraction test case seems to fail on non-Windows platforms.
> The troublesome test file is ample_fonts_solidconvertor.pdf, and the textextract.log file says the following (^@ is U+0000 and � is U+FFFD):
> Lines differ at index expected:46-253 actual:46-65533
> FAILURE: Line mismatch for file sample_fonts_solidconvertor.pdf at expected line: 8 at actual line: 8
> expected line was: "^@V^@e^@r^@d^@a^@n^@a^@:^@ ^@T^@o^@t^@o^@ ^@j^@e^@ ^@p^@o^@k^@u^@s^@n^@ý^@ ^@t^@e^@x^@t^@ ^@s^@ ^A"
> actual line was: "^@V^@e^@r^@d^@a^@n^@a^@:^@ ^@T^@o^@t^@o^@ ^@j^@e^@ ^@p^@o^@k^@u^@s^@n^@�^@ ^@t^@e^@x^@t^@ ^@s^@ ^A"
> Lines differ at index expected:4-253 actual:4-65533
> FAILURE: Line mismatch for file sample_fonts_solidconvertor.pdf at expected line: 10 at actual line: 10
> expected line was: "^AY^A~^@ý^@á^@í^@é"
> actual line was: "^AY^A~^@�^@�^@�^@�"
> Lines differ at index expected:52-253 actual:52-65533
> FAILURE: Line mismatch for file sample_fonts_solidconvertor.pdf at expected line: 11 at actual line: 11
> expected line was: "^@S^@a^@n^@s^@ ^@s^@e^@r^@i^@f^@:^@ ^@T^@o^@t^@o^@ ^@j^@e^@ ^@p^@o^@k^@u^@s^@n^@ý^@ ^@t^@e^@x^@t^@ ^@s^@ ^A"
> actual line was: "^@S^@a^@n^@s^@ ^@s^@e^@r^@i^@f^@:^@ ^@T^@o^@t^@o^@ ^@j^@e^@ ^@p^@o^@k^@u^@s^@n^@�^@ ^@t^@e^@x^@t^@ ^@s^@ ^A"
> Lines differ at index expected:4-253 actual:4-65533
> FAILURE: Line mismatch for file sample_fonts_solidconvertor.pdf at expected line: 13 at actual line: 13
> expected line was: "^AY^A~^@ý^@á^@í^@é"
> actual line was: "^AY^A~^@�^@�^@�^@�"
> Preparing to parse sample_fonts_solidconvertor.pdf for sorted test
> Lines differ at index expected:46-253 actual:46-65533
> FAILURE: Line mismatch for file sample_fonts_solidconvertor.pdf at expected line: 8 at actual line: 8
> expected line was: "^@V^@e^@r^@d^@a^@n^@a^@:^@ ^@T^@o^@t^@o^@ ^@j^@e^@ ^@p^@o^@k^@u^@s^@n^@ý^@ ^@t^@e^@x^@t^@ ^@s^@ ^A"
> actual line was: "^@V^@e^@r^@d^@a^@n^@a^@:^@ ^@T^@o^@t^@o^@ ^@j^@e^@ ^@p^@o^@k^@u^@s^@n^@�^@ ^@t^@e^@x^@t^@ ^@s^@ ^A"
> Lines differ at index expected:0-253 actual:0-65533
> FAILURE: Line mismatch for file sample_fonts_solidconvertor.pdf at expected line: 10 at actual line: 10
> expected line was: "^@ý^@á^@í^@é"
> actual line was: "^@�^@�^@�^@�"
> Lines differ at index expected:52-253 actual:52-65533
> FAILURE: Line mismatch for file sample_fonts_solidconvertor.pdf at expected line: 11 at actual line: 11
> expected line was: "^@S^@a^@n^@s^@ ^@s^@e^@r^@i^@f^@:^@ ^@T^@o^@t^@o^@ ^@j^@e^@ ^@p^@o^@k^@u^@s^@n^@ý^@ ^@t^@e^@x^@t^@ ^@s^@ ^A"
> actual line was: "^@S^@a^@n^@s^@ ^@s^@e^@r^@i^@f^@:^@ ^@T^@o^@t^@o^@ ^@j^@e^@ ^@p^@o^@k^@u^@s^@n^@�^@ ^@t^@e^@x^@t^@ ^@s^@ ^A"
> Lines differ at index expected:4-253 actual:4-65533
> FAILURE: Line mismatch for file sample_fonts_solidconvertor.pdf at expected line: 13 at actual line: 13
> expected line was: "^A~^AY^@ý^@á^@í^@é"
> actual line was: "^A~^AY^@�^@�^@�^@�"
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.