You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@pdfbox.apache.org by "Brian Carrier (JIRA)" <ji...@apache.org> on 2008/11/18 15:37:44 UTC

[jira] Commented: (PDFBOX-388) Store expected test output as UTF-8 text files with native line endings

    [ https://issues.apache.org/jira/browse/PDFBOX-388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12648612#action_12648612 ] 

Brian Carrier commented on PDFBOX-388:
--------------------------------------

I agree that the current setup is difficult to debug and review failures.  Another approach that I was looking at (but have not yet tried) is to drop in something like TextDiff so that a more intelligent 'diff'ing process existed in the regression tests.   

http://www.surfscranton.com/Architecture/TextDiff.htm



> Store expected test output as UTF-8 text files with native line endings
> -----------------------------------------------------------------------
>
>                 Key: PDFBOX-388
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-388
>             Project: PDFBox
>          Issue Type: Improvement
>          Components: Text extraction
>            Reporter: Jukka Zitting
>            Priority: Minor
>
> Currently the expected test output files in test/input are stored as UTF-16 files marked as application/octet-stream. This makes it hard to report or review changes to text extraction output.
> We could improve this by modifying the test suite to produce UTF-8 with native line endings and by updating the expected output files accordingly. Then any changes could be easily reported in patch format.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.