You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Edward Quick <ed...@hotmail.com> on 2008/09/16 18:30:08 UTC
search
Hi,
I wondered if the config files in the nutch webapp (ie WEB-INF/classes) such as nutch-site.xml and crawl-urlfilter.txt get used by the webapp for searching?
Reason is when I search on something I get back the following urls:
http://somehost.baplc.com/general/aptrix/apteng.nsf/Content/Engineering+Home/Projects+&+Systems/Engineering+Fit+for+5?OpenDocument&ExpandSection=3
http://somehost.baplc.com/general/aptrix/apteng.nsf/Content/Engineering+Home/Projects+&+Systems/Engineering+Fit+for+5?OpenDocument&ExpandSection=4
http://somehost.baplc.com/general/aptrix/apteng.nsf/Content/Engineering+Home/Projects+&+Systems/Engineering+Fit+for+5?OpenDocument&ExpandSection=5
http://somehost.baplc.com/general/aptrix/apteng.nsf/Content/Engineering+Home/Projects+&+Systems/Engineering+Fit+for+5?OpenDocument&ExpandSection=2
http://somehost.baplc.com/general/aptrix/apteng.nsf/Content/Engineering+Home/Projects+&+Systems/Engineering+Fit+for+5?OpenDocument&ExpandSection=1
which effectively are all the same page, so although I want the crawl to parse these, I was the webapp search to only return the url up to the query,eg:
http://somehost.baplc.com/general/aptrix/apteng.nsf/Content/Engineering+Home/Projects+&+Systems/Engineering+Fit+for+5
Hope that makes sense.
Thanks for any help,
Ed.
_________________________________________________________________
Get all your favourite content with the slick new MSN Toolbar - FREE
http://clk.atdmt.com/UKM/go/111354027/direct/01/