You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@tika.apache.org by reinhard schwab <re...@aon.at> on 2010/08/30 22:18:43 UTC

NoSuchElementException with HtmlParser

i get this exception when parsing

http://appft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&p=1&u=%2Fnetahtml%2FPTO%2Fsearch-bool.html&r=2&f=G&l=50&co1=AND&d=PG01&s1=%22ceusters,+werner%22.IN.&OS=IN/%22ceusters,+werner%22&RS=IN/%22ceusters,+werner%22

text/html
Using Tika parser org.apache.tika.parser.html.HtmlParser for mime-type
text/html
java.util.NoSuchElementException
    at java.util.LinkedList.remove(LinkedList.java:788)
    at java.util.LinkedList.removeFirst(LinkedList.java:134)
    at
org.apache.tika.sax.xpath.MatchingContentHandler.endElement(MatchingContentHandler.java:75)
    at
org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandlerDecorator.java:136)
    at
org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandlerDecorator.java:136)
    at
org.apache.tika.sax.XHTMLContentHandler.endElement(XHTMLContentHandler.java:248)
    at
org.apache.tika.sax.XHTMLContentHandler.endElement(XHTMLContentHandler.java:283)
    at
org.apache.tika.parser.html.HtmlHandler.endElement(HtmlHandler.java:185)
    at
org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandlerDecorator.java:136)
    at
org.apache.tika.parser.html.XHTMLDowngradeHandler.endElement(XHTMLDowngradeHandler.java:68)
    at org.ccil.cowan.tagsoup.Parser.pop(Parser.java:736)
    at org.ccil.cowan.tagsoup.Parser.etag_basic(Parser.java:706)
    at org.ccil.cowan.tagsoup.Parser.stagc(Parser.java:1019)
    at org.ccil.cowan.tagsoup.HTMLScanner.scan(HTMLScanner.java:565)
    at org.ccil.cowan.tagsoup.Parser.parse(Parser.java:449)
    at org.apache.tika.parser.html.HtmlParser.parse(HtmlParser.java:190)

best regards
reinhard