You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Jason Shi <nu...@gmail.com> on 2011/03/07 04:29:15 UTC

Re: web search returns less results than command searchctionailtity

thank you guys.I debuged  search.jsp,It seems nutch can recognize the
sentence that I input.I' m little confused about how does Nutch parse a
query string,and dertermine what fields it will search in.I used luke to
check the indexes,It seems fine.I can get search result.buy the way I
changed the tokenstream funciton in  NutchDocumentAnalyzer.java,to use a
chinese analyzer.I have been debugging the whole day,still have no clue.

2011/2/28 McGibbney, Lewis John <Le...@gcu.ac.uk>

> to add to this...
>
> please try Solr for search  funtionality. Solr.war
>
> Thank you Lewis
>
> ________________________________________
> From: Alexander Aristov [alexander.aristov@gmail.com]
> Sent: 28 February 2011 09:20
> To: user@nutch.apache.org
> Cc: Jason Shi; nutch-user@lucene.apache.org
> Subject: Re: web search returns less results than command search
>
> Hi
>
> Firstly I would suspect character encoding issues. Turn on tracsing on web
> server and check which senetence is searched.
>
> Next thing is dedup. It can reduce number of results and it's turned on by
> default. But of cause it should not reduce to 0.
>
> Best Regards
> Alexander Aristov
>
>
> On 28 February 2011 05:53, Jason Shi <nu...@gmail.com> wrote:
>
> > hi guys,I'm using nutch-1.0 for Chinese web search,I changed the
> > NutchDocumentAnalyzer.java to use imdict-chinese-analyzer,which is
> > dedicated
> > to chinese word segmentation,after successfully crawled my computer
> > department's website,and deployed nutch-1.0.war,I found that nutch web
> > search returns much less results than command search.for example, this
> > command "bin/nutch org.apache.nutch.searcher.NutchBean 计算机",returns
> > 265hits,but the web search returns 0 result.
> > any help would be greatly appreciated.
> >
> Email has been scanned for viruses by Altman Technologies' email management
> service - www.altman.co.uk/emailsystems
>
> Glasgow Caledonian University is a registered Scottish charity, number
> SC021474
>
> Winner: Times Higher Education’s Widening Participation Initiative of the
> Year 2009 and Herald Society’s Education Initiative of the Year 2009.
>
> http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,en.html
>
> Winner: Times Higher Education’s Outstanding Support for Early Career
> Researchers of the Year 2010, GCU as a lead with Universities Scotland
> partners.
>
> http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,15691,en.html
>