You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Jack Tang <hi...@gmail.com> on 2005/09/12 21:54:25 UTC

Re: crawling protected pages

Hi Andrzej

There is HttpAuthenticationFactory class in protocol-httpclient
plugin. But I doubt that whether RFC 2617 basic authentication works.
I cannot see the reference to HttpAuthenticationFactory class. I
missed something?

Reagds
/Jack

On 9/13/05, Andrzej Bialecki <ab...@getopt.org> wrote:
> Edward Quick wrote:
> > Hi,
> >
> > I posted to the user list but didn't get a reply. I want to crawl a
> > protected site, but there doesn't seem to be an option for that in Nutch
> > at the moment.
> >
> > However, it doesn't sound like something that would be too hard to add,
> > assuming the java http client library can handle that. As I'm not
> > familiar with the code, could someone point me at the file (or files) in
> > the source which do the crawling please? I'm not professing to be a top
> > Java programmer (perl's my speciality) but I'll give it a shot, unless
> > anyone else wants to?!
> 
> The quick hack would be to add necessary code somewhere in
> protocol-httpclient. Eventually though, I think Nutch should grow an
> authentication factory, which would supply needed credentials to other
> plugins.
> 
> --
> Best regards,
> Andrzej Bialecki     <><
>  ___. ___ ___ ___ _ _   __________________________________
> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> ___|||__||  \|  ||  |  Embedded Unix, System Integration
> http://www.sigram.com  Contact: info at sigram dot com
> 
> 


-- 
Keep Discovering ... ...
http://www.jroller.com/page/jmars

Re: crawling protected pages

Posted by Andrzej Bialecki <ab...@getopt.org>.
Jack Tang wrote:
> Hi Andrzej
> 
> There is HttpAuthenticationFactory class in protocol-httpclient
> plugin. But I doubt that whether RFC 2617 basic authentication works.
> I cannot see the reference to HttpAuthenticationFactory class. I
> missed something?

Unfortunately, you didn't - when I imported the plugin I left this class 
in place as a sort of reminder to complete this part... but as it is now 
it is not used.

-- 
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com