You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Jason Shi <nu...@gmail.com> on 2011/02/28 03:53:59 UTC

web search returns less results than command search

hi guys,I'm using nutch-1.0 for Chinese web search,I changed the
NutchDocumentAnalyzer.java to use imdict-chinese-analyzer,which is dedicated
to chinese word segmentation,after successfully crawled my computer
department's website,and deployed nutch-1.0.war,I found that nutch web
search returns much less results than command search.for example, this
command "bin/nutch org.apache.nutch.searcher.NutchBean 计算机",returns
265hits,but the web search returns 0 result.
any help would be greatly appreciated.

Re: web search returns less results than command search

Posted by Alexander Aristov <al...@gmail.com>.
Hi

Firstly I would suspect character encoding issues. Turn on tracsing on web
server and check which senetence is searched.

Next thing is dedup. It can reduce number of results and it's turned on by
default. But of cause it should not reduce to 0.

Best Regards
Alexander Aristov


On 28 February 2011 05:53, Jason Shi <nu...@gmail.com> wrote:

> hi guys,I'm using nutch-1.0 for Chinese web search,I changed the
> NutchDocumentAnalyzer.java to use imdict-chinese-analyzer,which is
> dedicated
> to chinese word segmentation,after successfully crawled my computer
> department's website,and deployed nutch-1.0.war,I found that nutch web
> search returns much less results than command search.for example, this
> command "bin/nutch org.apache.nutch.searcher.NutchBean 计算机",returns
> 265hits,but the web search returns 0 result.
> any help would be greatly appreciated.
>