You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 20:30:14 UTC
[jira] [Resolved] (NUTCH-745) MyHtmlParser getParse return not null,so all Analyzer-(zh|fr) cannot run
[ https://issues.apache.org/jira/browse/NUTCH-745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Lewis John McGibbney resolved NUTCH-745.
----------------------------------------
Resolution: Invalid
close of legacy issue
> MyHtmlParser getParse return not null,so all Analyzer-(zh|fr) cannot run
> ------------------------------------------------------------------------
>
> Key: NUTCH-745
> URL: https://issues.apache.org/jira/browse/NUTCH-745
> Project: Nutch
> Issue Type: Bug
> Affects Versions: 1.0.0
> Environment: JDK1.6 + tomcat 6 + Eclipse3.3 + nutch 1.0
> Reporter: jcore_XiaTian
>
> MyHtmlParser getParse return not null,so all Analyzer-(zh|fr) cannot run
> public ParseResult getParse(Content content) {
> return ParseResult.createParseResult(content.getUrl(), new ParseStatus(ParseStatus.FAILED,
> ParseStatus.FAILED_MISSING_CONTENT,
> "No textual content available").getEmptyParse(conf));
>
> // return null;
> }
> ========nutch-site.xml=======
> <property>
> <name>plugin.includes</name>
> <value>protocol-http|urlfilter-regex|parse-(myHtml|html|text|js)|index-(basic|anchor)|query-(basic|site|url)|response-(json|xml)|summary-basic|scoring-opic|urlnormalizer-(pass|regex|basic)|language-identifier|analysis-(zh)</value>
> <description><![CDATA[
>
> ]]> </description>
> </property>
> ==========parse-plugins.xml============
> <mimeType name="text/html">
> <plugin id="parse-myHtml" />
> <plugin id="parse-html" />
> </mimeType>
> <alias name="parse-myHtml"
> extension-id="org.apache.nutch.parse.html.MyHtmlParser" />
> ===src/plugin/parse-html/src/java/org/apache/nutch/parse/html/HtmlParser.java========
> public ParseResult getParse(Content content) {
> .....
> // cannot run the code:
> ParseResult filteredParse = this.htmlParseFilters.filter(content, parseResult,
> metaTags, root);
> .......
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira