You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Michael Piccuirro <mi...@gmail.com> on 2008/09/03 17:10:32 UTC
Re: A problem for web site needing username & password
If you're talking about basic http authentication I had the same problem
using nutch 0.9. I saw a few articles explaining how to do it by modifying
config files and nothing worked. So as a messy quick fixed I just modified
this file:
src\plugin\protocol-httpclient\src\java\org\apache\nutch\protocol\httpclient\Http.java
I just grab a username/password from the config:
String basicUsername = conf.get("http.auth.basic.username");
String basicPassword = conf.get("http.auth.basic.password");
//then set the credentials like this:
Credentials ntCreds = new NTCredentials(ntlmUsername, ntlmPassword,
ntlmHost, ntlmDomain);
client.getState().setCredentials(new AuthScope(ntlmHost,
AuthScope.ANY_PORT), ntCreds);
if (LOG.isInfoEnabled()) { LOG.info("**** setting basic auth
credentials ****"); }
client.getParams().setAuthenticationPreemptive(true);
client.getState().setCredentials(
new AuthScope("www.mydomain.com", AuthScope.ANY_PORT,
AuthScope.ANY_REALM),
new UsernamePasswordCredentials(basicUsername, basicPassword));
Not the best way to do this but it'll work.
Change the www.mydomain.com to your domain.
Also another way around it is you can have nutch go through a proxy then
have the proxy tack on the auth header. I was using CharlesProxy. Again not
the best to do this at all but it'll get you going.
On Mon, Jul 28, 2008 at 2:53 AM, zhengsj03 User <zh...@163.com> wrote:
> Hi!
> In many web sites username and password are needed to login.If I want to
> crawl a web site like this,and I know the username and password,how can
> I let the crawler know the username and password to login the site like
> a human doing.How can I change the configuration files?
> Thanks!
>
>
>
Re: A problem for web site needing username & password
Posted by zhengsj03 User <zh...@163.com>.
These days , I have tied to solve the problem by modifying the source
code,but failed.
I think your method will help me .I will try it. Thanks!
> If you're talking about basic http authentication I had the same problem
> using nutch 0.9. I saw a few articles explaining how to do it by modifying
> config files and nothing worked. So as a messy quick fixed I just modified
> this file:
>
> src\plugin\protocol-httpclient\src\java\org\apache\nutch\protocol\httpclient\Http.java
>
> I just grab a username/password from the config:
>
> String basicUsername = conf.get("http.auth.basic.username");
> String basicPassword = conf.get("http.auth.basic.password");
>
> //then set the credentials like this:
>
> Credentials ntCreds = new NTCredentials(ntlmUsername, ntlmPassword,
> ntlmHost, ntlmDomain);
> client.getState().setCredentials(new AuthScope(ntlmHost,
> AuthScope.ANY_PORT), ntCreds);
>
> if (LOG.isInfoEnabled()) { LOG.info("**** setting basic auth
> credentials ****"); }
> client.getParams().setAuthenticationPreemptive(true);
>
> client.getState().setCredentials(
> new AuthScope("www.mydomain.com", AuthScope.ANY_PORT,
> AuthScope.ANY_REALM),
> new UsernamePasswordCredentials(basicUsername, basicPassword));
>
>
> Not the best way to do this but it'll work.
>
> Change the www.mydomain.com to your domain.
>
>
> Also another way around it is you can have nutch go through a proxy then
> have the proxy tack on the auth header. I was using CharlesProxy. Again not
> the best to do this at all but it'll get you going.
>
>
> On Mon, Jul 28, 2008 at 2:53 AM, zhengsj03 User <zh...@163.com> wrote:
>
> > Hi!
> > In many web sites username and password are needed to login.If I want to
> > crawl a web site like this,and I know the username and password,how can
> > I let the crawler know the username and password to login the site like
> > a human doing.How can I change the configuration files?
> > Thanks!
> >
> >
> >