You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Matthew Vickery <vi...@gmail.com> on 2007/09/27 19:47:53 UTC

Is it possible to crawl a site that requires a log in?

Hi,

I have an pretty standard installation of MediaWiki (Version 1.10) and
wish to use Nutch as a search facility rater than the built in search
facility.   Just as Nutch is being used on the Mozilla Developer
Center: http://developer.mozilla.org/

However my Wiki is a company Intranet and so requires a user to log in
before they can view any pages beyond the log in page.  Is it possible
to crawl a MediaWiki site that requires a log in and uses the standard
MediaWiki authentication via cookies?

Many thanks in advance,

Matthew