You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by mbehlok <m_...@hotmail.com> on 2013/02/06 17:14:47 UTC

ASP.net - HTTP POST - javascript submit methods.

Hello,

Im new to nutch and started crawling simple html pages with no problem. Now
Im trying to crawl a page that chooses its content depending on http POST
params. Nutch doesn't seem to crawl beyond  "submit" forms. 

Here's my situation: the link I want to crawl is
href="javascript:__doPostBack('param1','0')". And what that method does is
call a myForm.submit(); javascript method. Question is: should source code
be tweaked to crawl this links? or is it a matter of configuration? which
classes should are involved?

thanks,
mitch.



--
View this message in context: http://lucene.472066.n3.nabble.com/ASP-net-HTTP-POST-javascript-submit-methods-tp4038807.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Re: ASP.net - HTTP POST - javascript submit methods.

Posted by Tejas Patil <te...@gmail.com>.
I dont think that there is any configuration parameter to do this. You will
have to write some code which would send POST and GET requests with the
relevant info in the requests. It wont be some simple 5-6 lines change that
would make it happen.. it will require more efforts than that.

Thanks,
Tejas Patil


On Wed, Feb 6, 2013 at 8:14 AM, mbehlok <m_...@hotmail.com> wrote:

> Hello,
>
> Im new to nutch and started crawling simple html pages with no problem. Now
> Im trying to crawl a page that chooses its content depending on http POST
> params. Nutch doesn't seem to crawl beyond  "submit" forms.
>
> Here's my situation: the link I want to crawl is
> href="javascript:__doPostBack('param1','0')". And what that method does is
> call a myForm.submit(); javascript method. Question is: should source code
> be tweaked to crawl this links? or is it a matter of configuration? which
> classes should are involved?
>
> thanks,
> mitch.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/ASP-net-HTTP-POST-javascript-submit-methods-tp4038807.html
> Sent from the Nutch - User mailing list archive at Nabble.com.
>