You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Cam Bazz <ca...@gmail.com> on 2011/07/19 01:04:21 UTC

parser warnings

What does the following log mean:

2011-07-19 01:00:07,034 WARN  parse.ParserFactory -
ParserFactory:Plugin: org.apache.nutch.parse.html.HtmlParser mapped to
contentType application/xhtml+xml via parse-plugins.xml, but its
plugin.xml file does not claim to support contentType:
application/xhtml+xml


Does that mean that my html parser is not getting part of the crawled data?

best.

Re: parser warnings

Posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov>.
Hi Cam,

They mean that the plugin.xml file declares that the 
parser supported mime type is NOT application/xhtml+html, 
but it is mapped to the content type application/xhtml+html in 
the parse-plugins.xml file.

Cheers,
Chris

On Jul 18, 2011, at 4:04 PM, Cam Bazz wrote:

> What does the following log mean:
> 
> 2011-07-19 01:00:07,034 WARN  parse.ParserFactory -
> ParserFactory:Plugin: org.apache.nutch.parse.html.HtmlParser mapped to
> contentType application/xhtml+xml via parse-plugins.xml, but its
> plugin.xml file does not claim to support contentType:
> application/xhtml+xml
> 
> 
> Does that mean that my html parser is not getting part of the crawled data?
> 
> best.


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattmann@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++