You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Claudio Martella <cl...@tis.bz.it> on 2011/01/11 15:16:10 UTC

default authentication scheme

Hi,

I'm trying to authenticate with HTTP through DIGEST.

I set the <default scheme="DIGEST"/> in my conf. The webserver supports
ntlmv2 and digest. As ntlmv2 is not supported by httpclient, i'd like to
force to use digest.
The problem is that as both ntlm and digest are negotiated from the
webserver, nutch still tries only ntlm.

Here are the logs:
2011-01-11 15:10:53,800 TRACE httpclient.Http - Credentials - username:
cm; set as default for realm: 192.168.10.210:8090; scheme: digest
[...]
2011-01-11 14:40:27,781 DEBUG wire.header - << "Server:
Microsoft-IIS/6.0[\r][\n]"
2011-01-11 14:40:27,781 DEBUG wire.header - << "WWW-Authenticate:
Negotiate[\r][\n]"
2011-01-11 14:40:27,782 DEBUG wire.header - << "WWW-Authenticate:
NTLM[\r][\n]"
2011-01-11 14:40:27,782 DEBUG wire.header - << "WWW-Authenticate: Digest
qop="auth",algorithm=MD5-sess,nonce="+Upgraded+v1df768737f223ab92f2de931c95b1cb01a23417361f3196f75258730e42c66de8880242a9455ccac90972955276d42451",charset=utf-8,realm="Digest"[\r][\n]"
[...]
2011-01-11 15:10:53,885 DEBUG httpclient.HttpMethodDirector -
Authorization required
2011-01-11 15:10:53,893 DEBUG auth.AuthChallengeProcessor - Supported
authentication schemes in the order of preference: [ntlm, digest, basic]
2011-01-11 15:10:53,893 INFO  auth.AuthChallengeProcessor - ntlm
authentication scheme selected
2011-01-11 15:10:53,893 DEBUG auth.AuthChallengeProcessor - Using
authentication scheme: ntlm
2011-01-11 15:10:53,893 DEBUG auth.AuthChallengeProcessor -
Authorization challenge processed
2011-01-11 15:10:53,893 DEBUG httpclient.HttpMethodDirector -
Authentication scope: NTLM <any realm>@192.168.10.210:8090
2011-01-11 15:10:53,893 DEBUG httpclient.HttpMethodDirector -
Credentials required
2011-01-11 15:10:53,893 DEBUG httpclient.HttpMethodDirector -
Credentials provider not available
2011-01-11 15:10:53,893 INFO  httpclient.HttpMethodDirector - No
credentials available for NTLM <any realm>@192.168.10.210:8090

By the way the documentation should be fixed. In order to see the
credentials set in the logs the log4j should be set to TRACE and not DEBUG.

Why isn't the Digest scheme being tried at all? The credentials are set
and the server negotiates it. I wrote a sample httpclient application
that forces the usage of digest and it does authenticate.

Any suggestion?

Thanks

Claudio


-- 
Claudio Martella
Digital Technologies
Unit Research & Development - Analyst

TIS innovation park
Via Siemens 19 | Siemensstr. 19
39100 Bolzano | 39100 Bozen
Tel. +39 0471 068 123
Fax  +39 0471 068 129
claudio.martella@tis.bz.it http://www.tis.bz.it


Re: default authentication scheme

Posted by Claudio Martella <cl...@tis.bz.it>.
Hi Susam,

I filed a JIRA with a patch:

https://issues.apache.org/jira/browse/NUTCH-958

What do you think?


On 1/11/11 4:49 PM, Susam Pal wrote:
> Hi Claudio,
>
> I worked on this a long time ago. As far as I remember, the Apache
> Jakarta Commons HttpClient library would attempt NTLM authentication
> if 'NTLM' is found in the 'WWW-Authenticate' header in the HTTP
> response. It would ignore 'Digest' in that case because NTLM
> authentication scheme is believed to be more secure than Digest
> authentication scheme. If you want to confirm this behaviour, you
> could try the Jakarta Commons HttpClient mailing list:
> http://hc.apache.org/httpclient-3.x/mail-lists.html
>
> Could you please share the code where you are able to force the usage
> of Digest authentication scheme?
>
> This is the source file where the code for authentication in Nutch is
> written: http://svn.apache.org/viewvc/nutch/trunk/src/plugin/protocol-httpclient/src/java/org/apache/nutch/protocol/httpclient/Http.java?view=markup
>
> In lines 334 to 337 and 402 to 405, you can find that all credentials
> are set as NTCredentials objects. I don't have a solution for you yet.
> But I hope that this information would help you in some manner.
>
> Regards,
> Susam Pal
>
> On Tue, Jan 11, 2011 at 7:46 PM, Claudio Martella
> <cl...@tis.bz.it> wrote:
>> Hi,
>>
>> I'm trying to authenticate with HTTP through DIGEST.
>>
>> I set the <default scheme="DIGEST"/> in my conf. The webserver supports
>> ntlmv2 and digest. As ntlmv2 is not supported by httpclient, i'd like to
>> force to use digest.
>> The problem is that as both ntlm and digest are negotiated from the
>> webserver, nutch still tries only ntlm.
>>
>> Here are the logs:
>> 2011-01-11 15:10:53,800 TRACE httpclient.Http - Credentials - username:
>> cm; set as default for realm: 192.168.10.210:8090; scheme: digest
>> [...]
>> 2011-01-11 14:40:27,781 DEBUG wire.header - << "Server:
>> Microsoft-IIS/6.0[\r][\n]"
>> 2011-01-11 14:40:27,781 DEBUG wire.header - << "WWW-Authenticate:
>> Negotiate[\r][\n]"
>> 2011-01-11 14:40:27,782 DEBUG wire.header - << "WWW-Authenticate:
>> NTLM[\r][\n]"
>> 2011-01-11 14:40:27,782 DEBUG wire.header - << "WWW-Authenticate: Digest
>> qop="auth",algorithm=MD5-sess,nonce="+Upgraded+v1df768737f223ab92f2de931c95b1cb01a23417361f3196f75258730e42c66de8880242a9455ccac90972955276d42451",charset=utf-8,realm="Digest"[\r][\n]"
>> [...]
>> 2011-01-11 15:10:53,885 DEBUG httpclient.HttpMethodDirector -
>> Authorization required
>> 2011-01-11 15:10:53,893 DEBUG auth.AuthChallengeProcessor - Supported
>> authentication schemes in the order of preference: [ntlm, digest, basic]
>> 2011-01-11 15:10:53,893 INFO  auth.AuthChallengeProcessor - ntlm
>> authentication scheme selected
>> 2011-01-11 15:10:53,893 DEBUG auth.AuthChallengeProcessor - Using
>> authentication scheme: ntlm
>> 2011-01-11 15:10:53,893 DEBUG auth.AuthChallengeProcessor -
>> Authorization challenge processed
>> 2011-01-11 15:10:53,893 DEBUG httpclient.HttpMethodDirector -
>> Authentication scope: NTLM <any realm>@192.168.10.210:8090
>> 2011-01-11 15:10:53,893 DEBUG httpclient.HttpMethodDirector -
>> Credentials required
>> 2011-01-11 15:10:53,893 DEBUG httpclient.HttpMethodDirector -
>> Credentials provider not available
>> 2011-01-11 15:10:53,893 INFO  httpclient.HttpMethodDirector - No
>> credentials available for NTLM <any realm>@192.168.10.210:8090
>>
>> By the way the documentation should be fixed. In order to see the
>> credentials set in the logs the log4j should be set to TRACE and not DEBUG.
>>
>> Why isn't the Digest scheme being tried at all? The credentials are set
>> and the server negotiates it. I wrote a sample httpclient application
>> that forces the usage of digest and it does authenticate.
>>
>> Any suggestion?
>>
>> Thanks
>>
>> Claudio
>>
>>
>> --
>> Claudio Martella
>> Digital Technologies
>> Unit Research & Development - Analyst
>>
>> TIS innovation park
>> Via Siemens 19 | Siemensstr. 19
>> 39100 Bolzano | 39100 Bozen
>> Tel. +39 0471 068 123
>> Fax  +39 0471 068 129
>> claudio.martella@tis.bz.it http://www.tis.bz.it
>>
>>


-- 
Claudio Martella
Digital Technologies
Unit Research & Development - Analyst

TIS innovation park
Via Siemens 19 | Siemensstr. 19
39100 Bolzano | 39100 Bozen
Tel. +39 0471 068 123
Fax  +39 0471 068 129
claudio.martella@tis.bz.it http://www.tis.bz.it

Short information regarding use of personal data. According to Section 13 of Italian Legislative Decree no. 196 of 30 June 2003, we inform you that we process your personal data in order to fulfil contractual and fiscal obligations and also to send you information regarding our services and events. Your personal data are processed with and without electronic means and by respecting data subjects' rights, fundamental freedoms and dignity, particularly with regard to confidentiality, personal identity and the right to personal data protection. At any time and without formalities you can write an e-mail to privacy@tis.bz.it in order to object the processing of your personal data for the purpose of sending advertising materials and also to exercise the right to access personal data and other rights referred to in Section 7 of Decree 196/2003. The data controller is TIS Techno Innovation Alto Adige, Siemens Street n. 19, Bolzano. You can find the complete information on the web site www.tis.bz.it.



Re: default authentication scheme

Posted by Claudio Martella <cl...@tis.bz.it>.
Hi Susam,

thanks for your answer. This is the code:

HttpClient client = new HttpClient();
client.getParams().setAuthenticationPreemptive(false);
Credentials defaultcreds = new NTCredentials("user", "password",
"client", "host");
List authPrefs = new ArrayList();
authPrefs.add(AuthPolicy.DIGEST);
authPrefs.add(AuthPolicy.BASIC);
// This will exclude the NTLM authentication scheme
client.getParams().setParameter(AuthPolicy.AUTH_SCHEME_PRIORITY, authPrefs);
client.getState().setCredentials(AuthScope.ANY, defaultcreds);
HttpMethod method = new GetMethod("host");

You basically play with priorities.
Yes, I checked the code this afternoon and I think a solution that could
work in the actual nutch code would be to set priority explicitly.
For example if you set Digest as the default scheme, he'll try that first.

What do you think?

On 1/11/11 4:49 PM, Susam Pal wrote:
> Hi Claudio,
>
> I worked on this a long time ago. As far as I remember, the Apache
> Jakarta Commons HttpClient library would attempt NTLM authentication
> if 'NTLM' is found in the 'WWW-Authenticate' header in the HTTP
> response. It would ignore 'Digest' in that case because NTLM
> authentication scheme is believed to be more secure than Digest
> authentication scheme. If you want to confirm this behaviour, you
> could try the Jakarta Commons HttpClient mailing list:
> http://hc.apache.org/httpclient-3.x/mail-lists.html
>
> Could you please share the code where you are able to force the usage
> of Digest authentication scheme?
>
> This is the source file where the code for authentication in Nutch is
> written: http://svn.apache.org/viewvc/nutch/trunk/src/plugin/protocol-httpclient/src/java/org/apache/nutch/protocol/httpclient/Http.java?view=markup
>
> In lines 334 to 337 and 402 to 405, you can find that all credentials
> are set as NTCredentials objects. I don't have a solution for you yet.
> But I hope that this information would help you in some manner.
>
> Regards,
> Susam Pal
>
> On Tue, Jan 11, 2011 at 7:46 PM, Claudio Martella
> <cl...@tis.bz.it> wrote:
>> Hi,
>>
>> I'm trying to authenticate with HTTP through DIGEST.
>>
>> I set the <default scheme="DIGEST"/> in my conf. The webserver supports
>> ntlmv2 and digest. As ntlmv2 is not supported by httpclient, i'd like to
>> force to use digest.
>> The problem is that as both ntlm and digest are negotiated from the
>> webserver, nutch still tries only ntlm.
>>
>> Here are the logs:
>> 2011-01-11 15:10:53,800 TRACE httpclient.Http - Credentials - username:
>> cm; set as default for realm: 192.168.10.210:8090; scheme: digest
>> [...]
>> 2011-01-11 14:40:27,781 DEBUG wire.header - << "Server:
>> Microsoft-IIS/6.0[\r][\n]"
>> 2011-01-11 14:40:27,781 DEBUG wire.header - << "WWW-Authenticate:
>> Negotiate[\r][\n]"
>> 2011-01-11 14:40:27,782 DEBUG wire.header - << "WWW-Authenticate:
>> NTLM[\r][\n]"
>> 2011-01-11 14:40:27,782 DEBUG wire.header - << "WWW-Authenticate: Digest
>> qop="auth",algorithm=MD5-sess,nonce="+Upgraded+v1df768737f223ab92f2de931c95b1cb01a23417361f3196f75258730e42c66de8880242a9455ccac90972955276d42451",charset=utf-8,realm="Digest"[\r][\n]"
>> [...]
>> 2011-01-11 15:10:53,885 DEBUG httpclient.HttpMethodDirector -
>> Authorization required
>> 2011-01-11 15:10:53,893 DEBUG auth.AuthChallengeProcessor - Supported
>> authentication schemes in the order of preference: [ntlm, digest, basic]
>> 2011-01-11 15:10:53,893 INFO  auth.AuthChallengeProcessor - ntlm
>> authentication scheme selected
>> 2011-01-11 15:10:53,893 DEBUG auth.AuthChallengeProcessor - Using
>> authentication scheme: ntlm
>> 2011-01-11 15:10:53,893 DEBUG auth.AuthChallengeProcessor -
>> Authorization challenge processed
>> 2011-01-11 15:10:53,893 DEBUG httpclient.HttpMethodDirector -
>> Authentication scope: NTLM <any realm>@192.168.10.210:8090
>> 2011-01-11 15:10:53,893 DEBUG httpclient.HttpMethodDirector -
>> Credentials required
>> 2011-01-11 15:10:53,893 DEBUG httpclient.HttpMethodDirector -
>> Credentials provider not available
>> 2011-01-11 15:10:53,893 INFO  httpclient.HttpMethodDirector - No
>> credentials available for NTLM <any realm>@192.168.10.210:8090
>>
>> By the way the documentation should be fixed. In order to see the
>> credentials set in the logs the log4j should be set to TRACE and not DEBUG.
>>
>> Why isn't the Digest scheme being tried at all? The credentials are set
>> and the server negotiates it. I wrote a sample httpclient application
>> that forces the usage of digest and it does authenticate.
>>
>> Any suggestion?
>>
>> Thanks
>>
>> Claudio
>>
>>
>> --
>> Claudio Martella
>> Digital Technologies
>> Unit Research & Development - Analyst
>>
>> TIS innovation park
>> Via Siemens 19 | Siemensstr. 19
>> 39100 Bolzano | 39100 Bozen
>> Tel. +39 0471 068 123
>> Fax  +39 0471 068 129
>> claudio.martella@tis.bz.it http://www.tis.bz.it
>>
>>


-- 
Claudio Martella
Digital Technologies
Unit Research & Development - Analyst

TIS innovation park
Via Siemens 19 | Siemensstr. 19
39100 Bolzano | 39100 Bozen
Tel. +39 0471 068 123
Fax  +39 0471 068 129
claudio.martella@tis.bz.it http://www.tis.bz.it

Short information regarding use of personal data. According to Section 13 of Italian Legislative Decree no. 196 of 30 June 2003, we inform you that we process your personal data in order to fulfil contractual and fiscal obligations and also to send you information regarding our services and events. Your personal data are processed with and without electronic means and by respecting data subjects' rights, fundamental freedoms and dignity, particularly with regard to confidentiality, personal identity and the right to personal data protection. At any time and without formalities you can write an e-mail to privacy@tis.bz.it in order to object the processing of your personal data for the purpose of sending advertising materials and also to exercise the right to access personal data and other rights referred to in Section 7 of Decree 196/2003. The data controller is TIS Techno Innovation Alto Adige, Siemens Street n. 19, Bolzano. You can find the complete information on the web site www.tis.bz.it.



Re: default authentication scheme

Posted by Susam Pal <su...@gmail.com>.
Hi Claudio,

I worked on this a long time ago. As far as I remember, the Apache
Jakarta Commons HttpClient library would attempt NTLM authentication
if 'NTLM' is found in the 'WWW-Authenticate' header in the HTTP
response. It would ignore 'Digest' in that case because NTLM
authentication scheme is believed to be more secure than Digest
authentication scheme. If you want to confirm this behaviour, you
could try the Jakarta Commons HttpClient mailing list:
http://hc.apache.org/httpclient-3.x/mail-lists.html

Could you please share the code where you are able to force the usage
of Digest authentication scheme?

This is the source file where the code for authentication in Nutch is
written: http://svn.apache.org/viewvc/nutch/trunk/src/plugin/protocol-httpclient/src/java/org/apache/nutch/protocol/httpclient/Http.java?view=markup

In lines 334 to 337 and 402 to 405, you can find that all credentials
are set as NTCredentials objects. I don't have a solution for you yet.
But I hope that this information would help you in some manner.

Regards,
Susam Pal

On Tue, Jan 11, 2011 at 7:46 PM, Claudio Martella
<cl...@tis.bz.it> wrote:
> Hi,
>
> I'm trying to authenticate with HTTP through DIGEST.
>
> I set the <default scheme="DIGEST"/> in my conf. The webserver supports
> ntlmv2 and digest. As ntlmv2 is not supported by httpclient, i'd like to
> force to use digest.
> The problem is that as both ntlm and digest are negotiated from the
> webserver, nutch still tries only ntlm.
>
> Here are the logs:
> 2011-01-11 15:10:53,800 TRACE httpclient.Http - Credentials - username:
> cm; set as default for realm: 192.168.10.210:8090; scheme: digest
> [...]
> 2011-01-11 14:40:27,781 DEBUG wire.header - << "Server:
> Microsoft-IIS/6.0[\r][\n]"
> 2011-01-11 14:40:27,781 DEBUG wire.header - << "WWW-Authenticate:
> Negotiate[\r][\n]"
> 2011-01-11 14:40:27,782 DEBUG wire.header - << "WWW-Authenticate:
> NTLM[\r][\n]"
> 2011-01-11 14:40:27,782 DEBUG wire.header - << "WWW-Authenticate: Digest
> qop="auth",algorithm=MD5-sess,nonce="+Upgraded+v1df768737f223ab92f2de931c95b1cb01a23417361f3196f75258730e42c66de8880242a9455ccac90972955276d42451",charset=utf-8,realm="Digest"[\r][\n]"
> [...]
> 2011-01-11 15:10:53,885 DEBUG httpclient.HttpMethodDirector -
> Authorization required
> 2011-01-11 15:10:53,893 DEBUG auth.AuthChallengeProcessor - Supported
> authentication schemes in the order of preference: [ntlm, digest, basic]
> 2011-01-11 15:10:53,893 INFO  auth.AuthChallengeProcessor - ntlm
> authentication scheme selected
> 2011-01-11 15:10:53,893 DEBUG auth.AuthChallengeProcessor - Using
> authentication scheme: ntlm
> 2011-01-11 15:10:53,893 DEBUG auth.AuthChallengeProcessor -
> Authorization challenge processed
> 2011-01-11 15:10:53,893 DEBUG httpclient.HttpMethodDirector -
> Authentication scope: NTLM <any realm>@192.168.10.210:8090
> 2011-01-11 15:10:53,893 DEBUG httpclient.HttpMethodDirector -
> Credentials required
> 2011-01-11 15:10:53,893 DEBUG httpclient.HttpMethodDirector -
> Credentials provider not available
> 2011-01-11 15:10:53,893 INFO  httpclient.HttpMethodDirector - No
> credentials available for NTLM <any realm>@192.168.10.210:8090
>
> By the way the documentation should be fixed. In order to see the
> credentials set in the logs the log4j should be set to TRACE and not DEBUG.
>
> Why isn't the Digest scheme being tried at all? The credentials are set
> and the server negotiates it. I wrote a sample httpclient application
> that forces the usage of digest and it does authenticate.
>
> Any suggestion?
>
> Thanks
>
> Claudio
>
>
> --
> Claudio Martella
> Digital Technologies
> Unit Research & Development - Analyst
>
> TIS innovation park
> Via Siemens 19 | Siemensstr. 19
> 39100 Bolzano | 39100 Bozen
> Tel. +39 0471 068 123
> Fax  +39 0471 068 129
> claudio.martella@tis.bz.it http://www.tis.bz.it
>
>