You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Cam Bazz <ca...@gmail.com> on 2011/07/19 01:04:21 UTC
parser warnings
What does the following log mean:
2011-07-19 01:00:07,034 WARN parse.ParserFactory -
ParserFactory:Plugin: org.apache.nutch.parse.html.HtmlParser mapped to
contentType application/xhtml+xml via parse-plugins.xml, but its
plugin.xml file does not claim to support contentType:
application/xhtml+xml
Does that mean that my html parser is not getting part of the crawled data?
best.
Re: parser warnings
Posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov>.
Hi Cam,
They mean that the plugin.xml file declares that the
parser supported mime type is NOT application/xhtml+html,
but it is mapped to the content type application/xhtml+html in
the parse-plugins.xml file.
Cheers,
Chris
On Jul 18, 2011, at 4:04 PM, Cam Bazz wrote:
> What does the following log mean:
>
> 2011-07-19 01:00:07,034 WARN parse.ParserFactory -
> ParserFactory:Plugin: org.apache.nutch.parse.html.HtmlParser mapped to
> contentType application/xhtml+xml via parse-plugins.xml, but its
> plugin.xml file does not claim to support contentType:
> application/xhtml+xml
>
>
> Does that mean that my html parser is not getting part of the crawled data?
>
> best.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattmann@nasa.gov
WWW: http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++