You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by "jet2web@trashmail.net" <je...@trashmail.net> on 2009/04/10 02:41:54 UTC

java.nio.charset.IllegalCharsetNameException

<ma...@lucene.apache.org>
Hello!

I get the fowling error when I run nutch 1.0 and 1.1dev on some sites:

Error parsing: http://www.nasa.gov/centers/goddard/home/index.html: 
failed(2,200): java.nio.charset.IllegalCharsetNameException: .utf8

nutch log:

2009-04-10 02:23:56,902 WARN  parse.html - 
java.nio.charset.IllegalCharsetNameException: .utf8
2009-04-10 02:23:56,903 WARN  parse.html - at 
java.nio.charset.Charset.checkName(Charset.java:285)
2009-04-10 02:23:56,903 WARN  parse.html - at 
java.nio.charset.Charset.lookup2(Charset.java:459)
2009-04-10 02:23:56,903 WARN  parse.html - at 
java.nio.charset.Charset.lookup(Charset.java:438)
2009-04-10 02:23:56,903 WARN  parse.html - at 
java.nio.charset.Charset.isSupported(Charset.java:480)
2009-04-10 02:23:56,903 WARN  parse.html - at 
org.apache.nutch.util.EncodingDetector.resolveEncodingAlias(EncodingDetector.java:310) 

2009-04-10 02:23:56,903 WARN  parse.html - at 
org.apache.nutch.util.EncodingDetector.addClue(EncodingDetector.java:201)
2009-04-10 02:23:56,903 WARN  parse.html - at 
org.apache.nutch.util.EncodingDetector.addClue(EncodingDetector.java:208)
2009-04-10 02:23:56,903 WARN  parse.html - at 
org.apache.nutch.util.EncodingDetector.autoDetectClues(EncodingDetector.java:193) 

2009-04-10 02:23:56,903 WARN  parse.html - at 
org.apache.nutch.parse.html.HtmlParser.getParse(HtmlParser.java:136)
2009-04-10 02:23:56,904 WARN  parse.html - at 
org.apache.nutch.parse.ParseUtil.parse(ParseUtil.java:82)
2009-04-10 02:23:56,904 WARN  parse.html - at 
org.apache.nutch.fetcher.Fetcher$FetcherThread.output(Fetcher.java:766)
2009-04-10 02:23:56,904 WARN  parse.html - at 
org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:552)
2009-04-10 02:23:56,911 WARN  fetcher.Fetcher - Error parsing: 
http://www.nasa.gov/centers/goddard/home/index.html: failed(2,200): 
java.nio.charset.IllegalCharsetNameException: .utf8


thanks!