You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@tika.apache.org by reinhard schwab <re...@aon.at> on 2010/08/30 22:18:43 UTC
NoSuchElementException with HtmlParser
i get this exception when parsing
http://appft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&p=1&u=%2Fnetahtml%2FPTO%2Fsearch-bool.html&r=2&f=G&l=50&co1=AND&d=PG01&s1=%22ceusters,+werner%22.IN.&OS=IN/%22ceusters,+werner%22&RS=IN/%22ceusters,+werner%22
text/html
Using Tika parser org.apache.tika.parser.html.HtmlParser for mime-type
text/html
java.util.NoSuchElementException
at java.util.LinkedList.remove(LinkedList.java:788)
at java.util.LinkedList.removeFirst(LinkedList.java:134)
at
org.apache.tika.sax.xpath.MatchingContentHandler.endElement(MatchingContentHandler.java:75)
at
org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandlerDecorator.java:136)
at
org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandlerDecorator.java:136)
at
org.apache.tika.sax.XHTMLContentHandler.endElement(XHTMLContentHandler.java:248)
at
org.apache.tika.sax.XHTMLContentHandler.endElement(XHTMLContentHandler.java:283)
at
org.apache.tika.parser.html.HtmlHandler.endElement(HtmlHandler.java:185)
at
org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandlerDecorator.java:136)
at
org.apache.tika.parser.html.XHTMLDowngradeHandler.endElement(XHTMLDowngradeHandler.java:68)
at org.ccil.cowan.tagsoup.Parser.pop(Parser.java:736)
at org.ccil.cowan.tagsoup.Parser.etag_basic(Parser.java:706)
at org.ccil.cowan.tagsoup.Parser.stagc(Parser.java:1019)
at org.ccil.cowan.tagsoup.HTMLScanner.scan(HTMLScanner.java:565)
at org.ccil.cowan.tagsoup.Parser.parse(Parser.java:449)
at org.apache.tika.parser.html.HtmlParser.parse(HtmlParser.java:190)
best regards
reinhard