You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 20:30:14 UTC

[jira] [Resolved] (NUTCH-745) MyHtmlParser getParse return not null,so all Analyzer-(zh|fr) cannot run

     [ https://issues.apache.org/jira/browse/NUTCH-745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lewis John McGibbney resolved NUTCH-745.
----------------------------------------

    Resolution: Invalid

close of legacy issue
                
> MyHtmlParser getParse return not null,so all Analyzer-(zh|fr) cannot run
> ------------------------------------------------------------------------
>
>                 Key: NUTCH-745
>                 URL: https://issues.apache.org/jira/browse/NUTCH-745
>             Project: Nutch
>          Issue Type: Bug
>    Affects Versions: 1.0.0
>         Environment: JDK1.6 + tomcat 6 + Eclipse3.3 + nutch 1.0
>            Reporter: jcore_XiaTian
>
> MyHtmlParser getParse return not null,so all Analyzer-(zh|fr) cannot run
> 	public ParseResult getParse(Content content) {
>     	return ParseResult.createParseResult(content.getUrl(), new ParseStatus(ParseStatus.FAILED, 
>                 ParseStatus.FAILED_MISSING_CONTENT, 
>         "No textual content available").getEmptyParse(conf)); 
> 		
> 		// return null;
> 	}
> ========nutch-site.xml=======
> <property>
>   <name>plugin.includes</name>
>   <value>protocol-http|urlfilter-regex|parse-(myHtml|html|text|js)|index-(basic|anchor)|query-(basic|site|url)|response-(json|xml)|summary-basic|scoring-opic|urlnormalizer-(pass|regex|basic)|language-identifier|analysis-(zh)</value>
>   <description><![CDATA[
>   
>   ]]>  </description>
> </property>
> ==========parse-plugins.xml============
> <mimeType name="text/html">
> 		<plugin id="parse-myHtml" />
> 		<plugin id="parse-html" />
> 	</mimeType>
> <alias name="parse-myHtml"
> 			extension-id="org.apache.nutch.parse.html.MyHtmlParser" />
> ===src/plugin/parse-html/src/java/org/apache/nutch/parse/html/HtmlParser.java========
>  public ParseResult getParse(Content content) {
> .....
> // cannot run the code:
>   ParseResult filteredParse = this.htmlParseFilters.filter(content, parseResult, 
>                                                              metaTags, root);
> .......

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira