You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Jim Wilson <wi...@gmail.com> on 2006/09/08 20:05:49 UTC

Fetching past Authentication

Dear Nutch User List,

I am desperately trying to index an Intranet with the following
characteristics

1) Some sites require no authentication - these already work great!
2) Some sites require basic HTTP Authentication.
3) Some sites require NTLM Authentication.
4) No sites require both HTTP and NTLM (only one or the other).
5) The same Username/Password should work on all sites which require either
type of Authentication.
6) For sites requiring NTLM Authentication, the same Domain is always used.
7) If a site requires authentication, but the Username/Password mentioned
above fails, the site doesn't matter and does not need fetched/indexed.

My question is this: How can I provide a default Username/Password/Domain
for Nutch to use when answering HTTP or NTLM challenges?

(I really hope all I need is a couple of <property> tags in my
nutch-site.xml, but I'm beginning to doubt it).

I love Nutch, and really want to use it.  Please help if you know the
answer.  Thanks!

-- Jim R. Wilson

Re: Fetching past Authentication

Posted by Jim Wilson <wi...@gmail.com>.
Yeah - I saw that page too.  Looks like it's been done... but no mention of
how to do it.

This page on the wiki seems to indicate that a person by the name of Ken
Meltsner had at least a partial solution:

http://wiki.apache.org/nutch/TaskList?highlight=%28Authentication%29

Anyone?  Does anybody know how to beat authentication?  Thanks in advance.

-- Jim

On 9/9/06, Tomi NA <he...@gmail.com> wrote:
>
> On 9/8/06, Jim Wilson <wi...@gmail.com> wrote:
> > Dear Nutch User List,
> >
> > I am desperately trying to index an Intranet with the following
> > characteristics
> >
> > 1) Some sites require no authentication - these already work great!
> > 2) Some sites require basic HTTP Authentication.
> > 3) Some sites require NTLM Authentication.
> > 4) No sites require both HTTP and NTLM (only one or the other).
> > 5) The same Username/Password should work on all sites which require
> either
> > type of Authentication.
> > 6) For sites requiring NTLM Authentication, the same Domain is always
> used.
> > 7) If a site requires authentication, but the Username/Password
> mentioned
> > above fails, the site doesn't matter and does not need fetched/indexed.
> >
> > My question is this: How can I provide a default
> Username/Password/Domain
> > for Nutch to use when answering HTTP or NTLM challenges?
> >
> > (I really hope all I need is a couple of <property> tags in my
> > nutch-site.xml, but I'm beginning to doubt it).
> >
> > I love Nutch, and really want to use it.  Please help if you know the
> > answer.  Thanks!
>
> I'm also very interested in hearing more on the topic.
> The only mention of a solution to (a part of) this problem I found is
> http://www.dehora.net/journal/2005/11/nutch_with_basic_authentication.html
>
> t.n.a.
>

Re: Fetching past Authentication

Posted by Tomi NA <he...@gmail.com>.
On 9/8/06, Jim Wilson <wi...@gmail.com> wrote:
> Dear Nutch User List,
>
> I am desperately trying to index an Intranet with the following
> characteristics
>
> 1) Some sites require no authentication - these already work great!
> 2) Some sites require basic HTTP Authentication.
> 3) Some sites require NTLM Authentication.
> 4) No sites require both HTTP and NTLM (only one or the other).
> 5) The same Username/Password should work on all sites which require either
> type of Authentication.
> 6) For sites requiring NTLM Authentication, the same Domain is always used.
> 7) If a site requires authentication, but the Username/Password mentioned
> above fails, the site doesn't matter and does not need fetched/indexed.
>
> My question is this: How can I provide a default Username/Password/Domain
> for Nutch to use when answering HTTP or NTLM challenges?
>
> (I really hope all I need is a couple of <property> tags in my
> nutch-site.xml, but I'm beginning to doubt it).
>
> I love Nutch, and really want to use it.  Please help if you know the
> answer.  Thanks!

I'm also very interested in hearing more on the topic.
The only mention of a solution to (a part of) this problem I found is
http://www.dehora.net/journal/2005/11/nutch_with_basic_authentication.html

t.n.a.