You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by remi tassing <ta...@gmail.com> on 2012/01/18 15:47:11 UTC

Re: Nutch and Sharepoint authentication

I logged a JIRA for this issue. I wasn't sure if it was a bug or
improvement. But HttpUrlConnection does work for NTLMv2. So the problem
will be to integrate it to Nutch.

[1] https://issues.apache.org/jira/browse/NUTCH-1254

On Tue, Dec 20, 2011 at 10:49 AM, remi tassing <ta...@gmail.com>wrote:

> Hi,
>
> I tried the code snippet from the link below and it worked! Just need to
> figure out how to integrate that into Nutch, any help?
>
> [1]
> http://developer-resource.blogspot.com/2008/06/ntlm-authentication-from-java.html
>
>
> On Sat, Dec 17, 2011 at 3:07 PM, remi tassing <ta...@gmail.com>wrote:
>
>> How can I make Nutch use HttpUrlConnection instead of HttpClient in the
>> painless way? It's been 8years since I wrote any Java code :-/
>>
>>
>> On Saturday, December 17, 2011, remi tassing <ta...@gmail.com>
>> wrote:
>> > Hi,
>> >
>> > According to the link below, IIS gives an HTTP 500 response when the
>> server expects an NTLM V2 but is probably receiving an older version. I
>> would guess that the Httpclient in Nutch doesn't support NTLM V2.
>> >
>> > I would also guess that It worked for Arkadi because its server doesn't
>> use NTLM V2.
>> >
>> > Again according to the reference, Sun JRE 5 or higher fully suppliers
>> NTLM V2. I wonder why it wasn't used for Nutch.
>> >
>> > reference: http://oaklandsoftware.com/papers/ntlm.html
>> >
>> > On Wednesday, November 30, 2011, remi tassing <ta...@gmail.com>
>> wrote:
>> >> Thanks for tips Susam!
>> >> Unfortunately I don't have much support on the server side...
>> >> I have been tipped off by a friend mentioning the possibility of
>> crawlers being purposely blocked by the server.
>> >> So how can I make Nutch impersonate a browser?
>> >> I tried the tip in the following link but it didn't work:
>> http://osdir.com/ml/nutch-user.lucene.apache.org/2009-06/msg00022.html
>> >> Remi
>> >> On Sun, Nov 27, 2011 at 9:17 PM, Susam Pal <su...@susam.in> wrote:
>> >>>
>> >>> On Sun, Nov 27, 2011 at 4:41 PM, remi tassing <ta...@gmail.com>
>> wrote:
>> >>> > Hello guys,
>> >>> > With your advices, I tried tweaking config files during the
>> week-end and got
>> >>> > some problem I couldn't solve (I'm running nutch-1.2. Cygwin
>> couldn't get
>> >>> > nutch-1.3 to run).
>> >>> > A sample of my log file can be found below. I have two concerns:
>> >>> >   -How do I know if NTLM login worked?
>> >>> >   -How do I debug the http 500 error code? I suspect it might be
>> due to
>> >>> > cookies...
>> >>> > Thanks in advance for your help
>> >>> > ...
>> >>> > 2011-11-27 18:54:02,298 DEBUG auth.AuthChallengeProcessor -
>> Supported
>> >>> > authentication schemes in the order of preference: [ntlm, digest,
>> basic]
>> >>> > 2011-11-27 18:54:02,300 INFO  auth.AuthChallengeProcessor - ntlm
>> >>> > authentication scheme selected
>> >>> > DEBUG auth.AuthChallengeProcessor - Using authentication scheme:
>> ntlm
>> >>> > DEBUG auth.AuthChallengeProcessor - Authorization challenge
>> processed
>> >>> > INFO  fetcher.Fetcher - -activeThreads=1, spinWaiting=0,
>> >>> > fetchQueues.totalSize=0
>> >>> > INFO  fetcher.Fetcher - -activeThreads=1, spinWaiting=0,
>> >>> > fetchQueues.totalSize=0
>> >>> > INFO  fetcher.Fetcher - fetch of https://URL failed with: Http
>> code=500,
>> >>> > url=https://URL
>> >>> > INFO  fetcher.Fetcher - -finishing thread FetcherThread,
>> activeThreads=0
>> >>> > INFO  fetcher.Fetcher - -activeThreads=0, spinWaiting=0,
>> >>> > fetchQueues.totalSize=0
>> >>> > INFO  fetcher.Fetcher - -activeThreads=0
>> >>> > ...
>> >>>
>> >>> From the logs, Nutch did attempt an NTLM authentication but the server
>> >>> returned HTTP 500. It says nothing about whether the NTLM
>> >>> authentication succeeded or failed. It only indicates that the
>> >>> authentication failed. It suggests that an internal error happened in
>> >>> SharePoint.
>> >>>
>> >>> Now, this can happen due to a variety of reasons. I don't know much
>> >>> about how to troubleshoot this in the SharePoint side. Perhaps you
>> >>> should be looking into IIS logs, event viewer, etc. to figure why
>> >>> SharePoint didn't accept your credentials.
>> >>>
>> >>> Most likely it is some kind of configuration problem in either
>> >>> SharePoint or IIS due to which the the NTLM authentication is causing
>> >>> some trouble. Even though it is outside the scope of Nutch, from my
>> >>> very limited experience working with SharePoint, I can say that it
>> >>> might be a good idea to get the Microsoft technical support involved
>> >>> while trying to troubleshoot this.
>> >>>
>> >>> Regards,
>> >>> Susam Pal
>> >>> http://susam.in/
>> >>
>> >>
>> >>
>> >> --
>> >> Remi Tassing
>> >>
>> >>
>>
>
>
>
> --
> Remi Tassing
>
>