You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Edward Quick <ed...@hotmail.com> on 2008/09/16 18:30:08 UTC

search

Hi,

I wondered if the config files in the nutch webapp (ie WEB-INF/classes) such as nutch-site.xml and crawl-urlfilter.txt get used by the webapp for searching?
Reason is when I search on something I get back the following urls:


http://somehost.baplc.com/general/aptrix/apteng.nsf/Content/Engineering+Home/Projects+&+Systems/Engineering+Fit+for+5?OpenDocument&ExpandSection=3 
http://somehost.baplc.com/general/aptrix/apteng.nsf/Content/Engineering+Home/Projects+&+Systems/Engineering+Fit+for+5?OpenDocument&ExpandSection=4 
http://somehost.baplc.com/general/aptrix/apteng.nsf/Content/Engineering+Home/Projects+&+Systems/Engineering+Fit+for+5?OpenDocument&ExpandSection=5 
http://somehost.baplc.com/general/aptrix/apteng.nsf/Content/Engineering+Home/Projects+&+Systems/Engineering+Fit+for+5?OpenDocument&ExpandSection=2 
http://somehost.baplc.com/general/aptrix/apteng.nsf/Content/Engineering+Home/Projects+&+Systems/Engineering+Fit+for+5?OpenDocument&ExpandSection=1 

which effectively are all the same page, so although I want the crawl to parse these, I was the webapp search to only return the url up to the query,eg:

http://somehost.baplc.com/general/aptrix/apteng.nsf/Content/Engineering+Home/Projects+&+Systems/Engineering+Fit+for+5

Hope that makes sense.

Thanks for any help,

Ed.

_________________________________________________________________
Get all your favourite content with the slick new MSN Toolbar - FREE
http://clk.atdmt.com/UKM/go/111354027/direct/01/