You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Du...@Sun.COM on 2007/10/31 07:22:36 UTC

Using HTMLDocument

HI folks,

    I am using the HTMLDocument 
http://svn.apache.org/repos/asf/lucene/java/trunk/contrib/ant/src/java/org/apache/lucene/ant/

    HtmlDocument hd = new HtmlDocument (p.getInputStream());
    doc.add(new Field(F_CONTENTS, new StringReader(hd.getBody()),
    Field.TermVector.YES ));

    I keep getting these errors.

line 29 column 27 - Error: <st1:place> is not recognized!
line 29 column 47 - Error: <st1:country-region> is not recognized!
line 36 column 21 - Error: <o:p> is not recognized!
line 39 column 67 - Error: <o:p> is not recognized!
line 43 column 45 - Error: <o:p> is not recognized!
line 46 column 52 - Error: <o:p> is not recognized!
line 54 column 27 - Error: <o:p> is not recognized!
This document has errors that must be fixed before
using HTML Tidy to generate a tidied up version.


line 3 column 331 - Error: <img> missing '>' for end of tag
This document has errors that must be fixed before
using HTML Tidy to generate a tidied up version.


line 1 column 1,214 - Error: <img> missing '>' for end of tag
This document has errors that must be fixed before
using HTML Tidy to generate a tidied up version.


line 15 column 1 - Error: o:smarttagtype is not recognized!
line 17 column 1 - Error: o:smarttagtype is not recognized!
line 19 column 1 - Error: o:smarttagtype is not recognized!
line 21 column 1 - Error: o:smarttagtype is not recognized!
line 23 column 1 - Error: o:smarttagtype is not recognized!
line 111 column 48 - Error: <o:p> is not recognized!
line 111 column 196 - Error: <o:p> is not recognized!
This document has errors that must be fixed before
using HTML Tidy to generate a tidied up version.


line 1 column 1,444 - Error: <img> missing '>' for end of tag
This document has errors that must be fixed before
using HTML Tidy to generate a tidied up version.


line 1 column 1,384 - Error: <img> missing '>' for end of tag
This document has errors that must be fixed before
using HTML Tidy to generate a tidied up version.


line 662 column 11 - Error: <st1:city> is not recognized!
line 663 column 12 - Error: <st1:place> is not recognized!
line 682 column 91 - Error: <st1:personname> is not recognized!
line 686 column 87 - Error: <st1:place> is not recognized!
line 687 column 12 - Error: <st1:placename> is not recognized!
line 687 column 62 - Error: <st1:placetype> is not recognized!
This document has errors that must be fixed before
using HTML Tidy to generate a tidied up version.


line 283 column 61 - Error: <o:p> is not recognized!
line 288 column 72 - Error: <o:p> is not recognized!
This document has errors that must be fixed before
using HTML Tidy to generate a tidied up version.


line 118 column 41 - Error: <o:p> is not recognized!
line 151 column 34 - Error: <o:p> is not recognized!
line 153 column 22 - Error: <o:p> is not recognized!
This document has errors that must be fixed before
using HTML Tidy to generate a tidied up version.


line 174 column 43 - Error: <o:p> is not recognized!
line 209 column 36 - Error: <o:p> is not recognized!
line 212 column 17 - Error: <o:p> is not recognized!
This document has errors that must be fixed before
using HTML Tidy to generate a tidied up version.


line 163 column 47 - Error: <o:p> is not recognized!
line 198 column 38 - Error: <o:p> is not recognized!
line 200 column 28 - Error: <o:p> is not recognized!
This document has errors that must be fixed before
using HTML Tidy to generate a tidied up version.


line 123 column 18 - Error: <font> missing '>' for end of tag
line 195 column 25 - Error: <font> missing '>' for end of tag
This document has errors that must be fixed before
using HTML Tidy to generate a tidied up version.


line 123 column 18 - Error: <font> missing '>' for end of tag
line 195 column 25 - Error: <font> missing '>' for end of tag
This document has errors that must be fixed before
using HTML Tidy to generate a tidied up version.

_Durga