You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Andrzej Bialecki <ab...@getopt.org> on 2007/01/26 23:56:42 UTC
Re: Need help with form based authentication
sandeep pujar wrote:
> Greetings,
>
> Wanted to know if anybody had worked on form based
> authentication for the nutch crawler.
>
> any pointers, suggestions would help.
>
I have, without much success. Form-based authentication is different
from site to site - most sites don't use just a plain form with
username/password, but they use a wide variety of methods to check /
protect the data being sent. In extreme cases forms will use an embedded
challenge string, run a javascript-based md5 hash, and send only that
... in other cases some other tricks are played, with setting cookies,
redirecting, running javascripts, etc. In the end only perhaps 1 out of
50 sites was using a plain form authentication, and even that with
different field names on the form ... so I gave up.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
Re: Need help with form based authentication
Posted by sandeep pujar <sa...@yahoo.com>.
Thank you for your reply Andrzej,
I need to set this up for a single sign-on form based
authentication. In that case what approach do you
suggest ?
I was trying to put together a solution using
Apache HttpClient.
Very similar to this
http://www.java-tips.org/other-api-tips/httpclient/how-to-perform-form-based-logon.html
Thanks !
Sandeep
--- Andrzej Bialecki <ab...@getopt.org> wrote:
> sandeep pujar wrote:
> > Greetings,
> >
> > Wanted to know if anybody had worked on form based
> > authentication for the nutch crawler.
> >
> > any pointers, suggestions would help.
> >
>
> I have, without much success. Form-based
> authentication is different
> from site to site - most sites don't use just a
> plain form with
> username/password, but they use a wide variety of
> methods to check /
> protect the data being sent. In extreme cases forms
> will use an embedded
> challenge string, run a javascript-based md5 hash,
> and send only that
> ... in other cases some other tricks are played,
> with setting cookies,
> redirecting, running javascripts, etc. In the end
> only perhaps 1 out of
> 50 sites was using a plain form authentication, and
> even that with
> different field names on the form ... so I gave up.
>
> --
> Best regards,
> Andrzej Bialecki <><
> ___. ___ ___ ___ _ _
> __________________________________
> [__ || __|__/|__||\/| Information Retrieval,
> Semantic Web
> ___|||__|| \| || | Embedded Unix, System
> Integration
> http://www.sigram.com Contact: info at sigram dot
> com
>
>
>
____________________________________________________________________________________
Get your own web address.
Have a HUGE year through Yahoo! Small Business.
http://smallbusiness.yahoo.com/domains/?p=BESTDEAL