You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@poi.apache.org by bu...@apache.org on 2003/06/04 17:06:34 UTC

DO NOT REPLY [Bug 20060] - [PATCH] HDF text extraction patch

DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG 
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://nagoya.apache.org/bugzilla/show_bug.cgi?id=20060>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND 
INSERTED IN THE BUG DATABASE.

http://nagoya.apache.org/bugzilla/show_bug.cgi?id=20060

[PATCH] HDF text extraction patch





------- Additional Comments From thierry.guerin@prima-solutions.com  2003-06-04 15:06 -------
I've been working on the exact same thing, and I came up with different fixes 
that lead to the same result, but without having to remove 
the "findFormatting" from the WordDocument class. I now have merged Serge's 
patch with mine. The differences between Serge's modifications and mine are:
Utils.convertBytesToShort: patch to avoid an ArrayOutOfBoundsExceptions.
WordDocument.printTable: patch to avoid a NullPointerException
As of now, the only word documents that refuse to parse are the ones that 
throw the "Invalid header signature" error (see bug 11506 for the files). I 
may look into this in the future, but for now have no time to do so.
Following this message you will find the resulting CVS Diff.
Please bear in mind that my modifications, though working, are based only on 
fixes that seemed logical from a programming point of view (tests to avoid 
ArrayOutOfBoundsExceptions, etc..). I have _no_ knowledge of the Word file 
format and in the process might have done something stupid.