You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Susheel Kumar <su...@gmail.com> on 2019/07/02 16:23:05 UTC
IllegalArgumentException: No form exists: user-login-form
Hello Nutch Users,
I am a first time Nutch user and been trying to crawl an intranet
portal *https://pilot.mysite.sitecorp.com/user/login
<https://pilot.mysite.sitecorp.com/user/login>* using Nutch 1.15 and I am
always getting below "No form exists: user-login-form" error. I tried
crawling other login page like https://urs.earthdata.nasa.gov/ and do not
see such error but for this intranet site I am always getting this error.
I tried crawling the same url/login page using Selenium Chrome Drive and it
does load and fill in the user id/pwd text boxes.
What could be wrong. How can i further troubleshoot this?
Thanks in advance.
2019-07-02 10:36:59,152 DEBUG httpclient.HttpMethodBase - Resorting to
protocol version default close connection policy
2019-07-02 10:36:59,153 DEBUG httpclient.HttpMethodBase - Should NOT close
connection, using HTTP/1.1
2019-07-02 10:36:59,153 TRACE httpclient.HttpConnection - enter
HttpConnection.isResponseAvailable()
2019-07-02 10:36:59,153 TRACE httpclient.HttpConnection - enter
HttpConnection.releaseConnection()
2019-07-02 10:36:59,153 DEBUG httpclient.HttpConnection - Releasing
connection back to connection manager.
2019-07-02 10:36:59,153 TRACE httpclient.MultiThreadedHttpConnectionManager
- enter HttpConnectionManager.releaseConnection(HttpConnection)
2019-07-02 10:36:59,153 DEBUG httpclient.MultiThreadedHttpConnectionManager
- Freeing connection, hostConfig=HostConfiguration[host=
https://pilot.mysite.sitecorp.com]
2019-07-02 10:36:59,153 TRACE httpclient.MultiThreadedHttpConnectionManager
- enter HttpConnectionManager.ConnectionPool.getHostPool(HostConfiguration)
2019-07-02 10:36:59,153 DEBUG util.IdleConnectionHandler - Adding
connection at: 1562078219153
2019-07-02 10:36:59,153 DEBUG httpclient.MultiThreadedHttpConnectionManager
- Notifying no-one, there are no waiting threads
2019-07-02 10:36:59,202 DEBUG httpclient.HttpFormAuthentication - No form
element found with 'id' = user-login-form, trying 'name'.
2019-07-02 10:36:59,205 DEBUG httpclient.HttpFormAuthentication - No form
element found with 'name' = user-login-form
2019-07-02 10:36:59,205 ERROR httpclient.Http - Failed to get protocol
output
java.lang.RuntimeException: java.lang.IllegalArgumentException: No form
exists: user-login-form
at
org.apache.nutch.protocol.httpclient.Http.resolveCredentials(Http.java:500)
at
org.apache.nutch.protocol.httpclient.Http.getResponse(Http.java:177)
at
org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java:320)
at
org.apache.nutch.fetcher.FetcherThread.run(FetcherThread.java:343)
Caused by: java.lang.IllegalArgumentException: No form exists:
user-login-form
at
org.apache.nutch.protocol.httpclient.HttpFormAuthentication.getLoginFormParams(HttpFormAuthentication.java:219)
at
org.apache.nutch.protocol.httpclient.HttpFormAuthentication.login(HttpFormAuthentication.java:95)
at
org.apache.nutch.protocol.httpclient.Http.resolveCredentials(Http.java:498)
... 3 more
2019-07-02 10:36:59,209 INFO fetcher.FetcherThread - FetcherThread 41
fetch of https://pilot.mysite.sitecorp.com/user/login failed with:
java.lang.RuntimeException: java.lang.IllegalArgumentException: No form
exists: user-login-form
2019-07-02 10:36:59,210 INFO fetcher.FetcherThread - FetcherThread 41 has
no more work available
2019-07-02 10:36:59,210 INFO fetcher.FetcherThread - FetcherThread 41
-finishing thread FetcherThread, activeThreads=0
2019-07-02 10:36:59,215 INFO mapreduce.Job - Job job_local487279790_0001
running in uber mode : false
2019-07-02 10:36:59,216 INFO mapreduce.Job - map 0% reduce 0%
2019-07-02 10:36:59,635 INFO fetcher.Fetcher - -activeThreads=0,
spinWaiting=0, fetchQueues.totalSize=0, fetchQueues.getQueueCount=0
2019-07-02 10:36:59,635 INFO fetcher.Fetcher - -activeThreads=0
2019-07-02 10:37:00,218 INFO mapreduce.Job - map 100% reduce 100%
2019-07-02 10:37:00,218 INFO mapreduce.Job - Job job_local487279790_0001
completed successfully
Re: IllegalArgumentException: No form exists: user-login-form
Posted by Sebastian Nagel <wa...@googlemail.com.INVALID>.
> What could be going wrong with actual site? How can i debug/troubleshoot
> further?
Make sure the HTML source code contains the correct <form> element:
- do not use a browser (or disable JavaScript)
- use curl or wget instead download the page
- always be aware that the DOM tree in a web browser may look different
than that from parsing the bare HTML page
- obviously, Dropbox isn't an appropriate host for testing and debugging
Good luck!
Sebastian
On 7/10/19 4:21 AM, Susheel Kumar wrote:
> It looks like when i run the html page from my local tomcat
> http://localhost:8082/mysite/ I am not getting the "no form exist" error.
>
> What could be going wrong with actual site? How can i debug/troubleshoot
> further?
>
> Thanks,
> Susheel
>
> On Tue, Jul 9, 2019 at 10:08 PM Susheel Kumar <su...@gmail.com> wrote:
>
>> Thanks for the idea Sebastian. Let me try that.
>>
>>
>> On Tue, Jul 9, 2019 at 10:15 AM Sebastian Nagel
>> <wa...@googlemail.com.invalid> wrote:
>>
>>> Hi Ryan,
>>>
>>> there is one:
>>>
>>> <form class="user-login-form" data-drupal-selector="user-login-form"
>>> action="/user/login"
>>> method="post" id="user-login-form" accept-charset="UTF-8">
>>>
>>> But you would need to copy the content out from dropbox, put the page on
>>> your own server
>>> and try it.
>>>
>>> Best,
>>> Sebastian
>>>
>>> On 7/9/19 3:21 PM, Ryan Suarez wrote:
>>>> ok, so the error message is quite clear. There is no form on that link
>>>> you provided with an id or name of 'user-login-form'.
>>>>
>>>> On Mon, 2019-07-08 at 22:39 -0400, Susheel Kumar wrote:
>>>>> Hello Sebastian,
>>>>>
>>>>> Thanks for getting back. Here is the Login.html link which is
>>>>> throwing no
>>>>> form exists error.
>>>>>
>>>>> https://www.dropbox.com/s/jkts0eogarfs03j/Log%20in%20.html?dl=0
>>>>>
>>>>> Please take a look and suggest what could be wrong when trying to
>>>>> sign in
>>>>> to this site.
>>>>>
>>>>> Also below content of auth-configuration section of httpclient-
>>>>> auth.xml
>>>>>
>>>>> ---
>>>>> <credentials authMethod="formAuth"
>>>>> loginUrl="https://qa.mysite.sitecorp.com/user/login"
>>>>> loginFormId="user-login-form"
>>>>> loginRedirect="false">
>>>>> <loginPostData>
>>>>> <field name="name"
>>>>> value="Crawler"/>
>>>>> <field name="pass"
>>>>> value="spid3r_us"/>
>>>>> </loginPostData>
>>>>> <additionalPostHeaders>
>>>>> <field name="User-Agent"
>>>>> value="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_3)
>>>>> AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100
>>>>> Safari/537.36"
>>>>> />
>>>>> </additionalPostHeaders>
>>>>> <removedFormFields>
>>>>> <field name="ctl00$MainContent$LoginUser$RememberMe"/>
>>>>> </removedFormFields>
>>>>> <loginCookie>
>>>>> <policy>BROWSER_COMPATIBILITY</policy>
>>>>> </loginCookie>
>>>>> </credentials>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Jul 3, 2019 at 10:22 AM Sebastian Nagel
>>>>> <wa...@googlemail.com.invalid> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> the error message is quite clear:
>>>>>>
>>>>>>> 2019-07-02 10:36:59,202 DEBUG httpclient.HttpFormAuthentication -
>>>>>>> No form
>>>>>>> element found with 'id' = user-login-form, trying 'name'.
>>>>>>> 2019-07-02 10:36:59,205 DEBUG httpclient.HttpFormAuthentication -
>>>>>>> No form
>>>>>>> element found with 'name' = user-login-form
>>>>>>
>>>>>> But without access to the login page content, it's nearly
>>>>>> impossible to
>>>>>> determine
>>>>>> what's going wrong.
>>>>>>
>>>>>>
>>>>>>> I tried crawling the same url/login page using Selenium Chrome
>>>>>>> Drive and
>>>>>>
>>>>>> it
>>>>>>> does load and fill in the user id/pwd text boxes.
>>>>>>
>>>>>> Sounds like the page HTML source looks different with Selenium.
>>>>>> Note that
>>>>>> the
>>>>>> protocol-httpclient does not modify the DOM tree via Javascript, it
>>>>>> is
>>>>>> derived
>>>>>> from the bare HTML only. That could be a reason why the form
>>>>>> element is
>>>>>> not found
>>>>>> while it works in a browser (emulation).
>>>>>>
>>>>>>
>>>>>> Best,
>>>>>> Sebastian
>>>>>>
>>>
>>>
>
Re: IllegalArgumentException: No form exists: user-login-form
Posted by Susheel Kumar <su...@gmail.com>.
It looks like when i run the html page from my local tomcat
http://localhost:8082/mysite/ I am not getting the "no form exist" error.
What could be going wrong with actual site? How can i debug/troubleshoot
further?
Thanks,
Susheel
On Tue, Jul 9, 2019 at 10:08 PM Susheel Kumar <su...@gmail.com> wrote:
> Thanks for the idea Sebastian. Let me try that.
>
>
> On Tue, Jul 9, 2019 at 10:15 AM Sebastian Nagel
> <wa...@googlemail.com.invalid> wrote:
>
>> Hi Ryan,
>>
>> there is one:
>>
>> <form class="user-login-form" data-drupal-selector="user-login-form"
>> action="/user/login"
>> method="post" id="user-login-form" accept-charset="UTF-8">
>>
>> But you would need to copy the content out from dropbox, put the page on
>> your own server
>> and try it.
>>
>> Best,
>> Sebastian
>>
>> On 7/9/19 3:21 PM, Ryan Suarez wrote:
>> > ok, so the error message is quite clear. There is no form on that link
>> > you provided with an id or name of 'user-login-form'.
>> >
>> > On Mon, 2019-07-08 at 22:39 -0400, Susheel Kumar wrote:
>> >> Hello Sebastian,
>> >>
>> >> Thanks for getting back. Here is the Login.html link which is
>> >> throwing no
>> >> form exists error.
>> >>
>> >> https://www.dropbox.com/s/jkts0eogarfs03j/Log%20in%20.html?dl=0
>> >>
>> >> Please take a look and suggest what could be wrong when trying to
>> >> sign in
>> >> to this site.
>> >>
>> >> Also below content of auth-configuration section of httpclient-
>> >> auth.xml
>> >>
>> >> ---
>> >> <credentials authMethod="formAuth"
>> >> loginUrl="https://qa.mysite.sitecorp.com/user/login"
>> >> loginFormId="user-login-form"
>> >> loginRedirect="false">
>> >> <loginPostData>
>> >> <field name="name"
>> >> value="Crawler"/>
>> >> <field name="pass"
>> >> value="spid3r_us"/>
>> >> </loginPostData>
>> >> <additionalPostHeaders>
>> >> <field name="User-Agent"
>> >> value="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_3)
>> >> AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100
>> >> Safari/537.36"
>> >> />
>> >> </additionalPostHeaders>
>> >> <removedFormFields>
>> >> <field name="ctl00$MainContent$LoginUser$RememberMe"/>
>> >> </removedFormFields>
>> >> <loginCookie>
>> >> <policy>BROWSER_COMPATIBILITY</policy>
>> >> </loginCookie>
>> >> </credentials>
>> >>
>> >>
>> >>
>> >>
>> >> On Wed, Jul 3, 2019 at 10:22 AM Sebastian Nagel
>> >> <wa...@googlemail.com.invalid> wrote:
>> >>
>> >>> Hi,
>> >>>
>> >>> the error message is quite clear:
>> >>>
>> >>>> 2019-07-02 10:36:59,202 DEBUG httpclient.HttpFormAuthentication -
>> >>>> No form
>> >>>> element found with 'id' = user-login-form, trying 'name'.
>> >>>> 2019-07-02 10:36:59,205 DEBUG httpclient.HttpFormAuthentication -
>> >>>> No form
>> >>>> element found with 'name' = user-login-form
>> >>>
>> >>> But without access to the login page content, it's nearly
>> >>> impossible to
>> >>> determine
>> >>> what's going wrong.
>> >>>
>> >>>
>> >>>> I tried crawling the same url/login page using Selenium Chrome
>> >>>> Drive and
>> >>>
>> >>> it
>> >>>> does load and fill in the user id/pwd text boxes.
>> >>>
>> >>> Sounds like the page HTML source looks different with Selenium.
>> >>> Note that
>> >>> the
>> >>> protocol-httpclient does not modify the DOM tree via Javascript, it
>> >>> is
>> >>> derived
>> >>> from the bare HTML only. That could be a reason why the form
>> >>> element is
>> >>> not found
>> >>> while it works in a browser (emulation).
>> >>>
>> >>>
>> >>> Best,
>> >>> Sebastian
>> >>>
>>
>>
Re: IllegalArgumentException: No form exists: user-login-form
Posted by Susheel Kumar <su...@gmail.com>.
Thanks for the idea Sebastian. Let me try that.
On Tue, Jul 9, 2019 at 10:15 AM Sebastian Nagel
<wa...@googlemail.com.invalid> wrote:
> Hi Ryan,
>
> there is one:
>
> <form class="user-login-form" data-drupal-selector="user-login-form"
> action="/user/login"
> method="post" id="user-login-form" accept-charset="UTF-8">
>
> But you would need to copy the content out from dropbox, put the page on
> your own server
> and try it.
>
> Best,
> Sebastian
>
> On 7/9/19 3:21 PM, Ryan Suarez wrote:
> > ok, so the error message is quite clear. There is no form on that link
> > you provided with an id or name of 'user-login-form'.
> >
> > On Mon, 2019-07-08 at 22:39 -0400, Susheel Kumar wrote:
> >> Hello Sebastian,
> >>
> >> Thanks for getting back. Here is the Login.html link which is
> >> throwing no
> >> form exists error.
> >>
> >> https://www.dropbox.com/s/jkts0eogarfs03j/Log%20in%20.html?dl=0
> >>
> >> Please take a look and suggest what could be wrong when trying to
> >> sign in
> >> to this site.
> >>
> >> Also below content of auth-configuration section of httpclient-
> >> auth.xml
> >>
> >> ---
> >> <credentials authMethod="formAuth"
> >> loginUrl="https://qa.mysite.sitecorp.com/user/login"
> >> loginFormId="user-login-form"
> >> loginRedirect="false">
> >> <loginPostData>
> >> <field name="name"
> >> value="Crawler"/>
> >> <field name="pass"
> >> value="spid3r_us"/>
> >> </loginPostData>
> >> <additionalPostHeaders>
> >> <field name="User-Agent"
> >> value="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_3)
> >> AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100
> >> Safari/537.36"
> >> />
> >> </additionalPostHeaders>
> >> <removedFormFields>
> >> <field name="ctl00$MainContent$LoginUser$RememberMe"/>
> >> </removedFormFields>
> >> <loginCookie>
> >> <policy>BROWSER_COMPATIBILITY</policy>
> >> </loginCookie>
> >> </credentials>
> >>
> >>
> >>
> >>
> >> On Wed, Jul 3, 2019 at 10:22 AM Sebastian Nagel
> >> <wa...@googlemail.com.invalid> wrote:
> >>
> >>> Hi,
> >>>
> >>> the error message is quite clear:
> >>>
> >>>> 2019-07-02 10:36:59,202 DEBUG httpclient.HttpFormAuthentication -
> >>>> No form
> >>>> element found with 'id' = user-login-form, trying 'name'.
> >>>> 2019-07-02 10:36:59,205 DEBUG httpclient.HttpFormAuthentication -
> >>>> No form
> >>>> element found with 'name' = user-login-form
> >>>
> >>> But without access to the login page content, it's nearly
> >>> impossible to
> >>> determine
> >>> what's going wrong.
> >>>
> >>>
> >>>> I tried crawling the same url/login page using Selenium Chrome
> >>>> Drive and
> >>>
> >>> it
> >>>> does load and fill in the user id/pwd text boxes.
> >>>
> >>> Sounds like the page HTML source looks different with Selenium.
> >>> Note that
> >>> the
> >>> protocol-httpclient does not modify the DOM tree via Javascript, it
> >>> is
> >>> derived
> >>> from the bare HTML only. That could be a reason why the form
> >>> element is
> >>> not found
> >>> while it works in a browser (emulation).
> >>>
> >>>
> >>> Best,
> >>> Sebastian
> >>>
>
>
Re: IllegalArgumentException: No form exists: user-login-form
Posted by Sebastian Nagel <wa...@googlemail.com.INVALID>.
Hi Ryan,
there is one:
<form class="user-login-form" data-drupal-selector="user-login-form" action="/user/login"
method="post" id="user-login-form" accept-charset="UTF-8">
But you would need to copy the content out from dropbox, put the page on your own server
and try it.
Best,
Sebastian
On 7/9/19 3:21 PM, Ryan Suarez wrote:
> ok, so the error message is quite clear. There is no form on that link
> you provided with an id or name of 'user-login-form'.
>
> On Mon, 2019-07-08 at 22:39 -0400, Susheel Kumar wrote:
>> Hello Sebastian,
>>
>> Thanks for getting back. Here is the Login.html link which is
>> throwing no
>> form exists error.
>>
>> https://www.dropbox.com/s/jkts0eogarfs03j/Log%20in%20.html?dl=0
>>
>> Please take a look and suggest what could be wrong when trying to
>> sign in
>> to this site.
>>
>> Also below content of auth-configuration section of httpclient-
>> auth.xml
>>
>> ---
>> <credentials authMethod="formAuth"
>> loginUrl="https://qa.mysite.sitecorp.com/user/login"
>> loginFormId="user-login-form"
>> loginRedirect="false">
>> <loginPostData>
>> <field name="name"
>> value="Crawler"/>
>> <field name="pass"
>> value="spid3r_us"/>
>> </loginPostData>
>> <additionalPostHeaders>
>> <field name="User-Agent"
>> value="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_3)
>> AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100
>> Safari/537.36"
>> />
>> </additionalPostHeaders>
>> <removedFormFields>
>> <field name="ctl00$MainContent$LoginUser$RememberMe"/>
>> </removedFormFields>
>> <loginCookie>
>> <policy>BROWSER_COMPATIBILITY</policy>
>> </loginCookie>
>> </credentials>
>>
>>
>>
>>
>> On Wed, Jul 3, 2019 at 10:22 AM Sebastian Nagel
>> <wa...@googlemail.com.invalid> wrote:
>>
>>> Hi,
>>>
>>> the error message is quite clear:
>>>
>>>> 2019-07-02 10:36:59,202 DEBUG httpclient.HttpFormAuthentication -
>>>> No form
>>>> element found with 'id' = user-login-form, trying 'name'.
>>>> 2019-07-02 10:36:59,205 DEBUG httpclient.HttpFormAuthentication -
>>>> No form
>>>> element found with 'name' = user-login-form
>>>
>>> But without access to the login page content, it's nearly
>>> impossible to
>>> determine
>>> what's going wrong.
>>>
>>>
>>>> I tried crawling the same url/login page using Selenium Chrome
>>>> Drive and
>>>
>>> it
>>>> does load and fill in the user id/pwd text boxes.
>>>
>>> Sounds like the page HTML source looks different with Selenium.
>>> Note that
>>> the
>>> protocol-httpclient does not modify the DOM tree via Javascript, it
>>> is
>>> derived
>>> from the bare HTML only. That could be a reason why the form
>>> element is
>>> not found
>>> while it works in a browser (emulation).
>>>
>>>
>>> Best,
>>> Sebastian
>>>
Re: IllegalArgumentException: No form exists: user-login-form
Posted by Ryan Suarez <ry...@sheridancollege.ca>.
ok, so the error message is quite clear. There is no form on that link
you provided with an id or name of 'user-login-form'.
On Mon, 2019-07-08 at 22:39 -0400, Susheel Kumar wrote:
> Hello Sebastian,
>
> Thanks for getting back. Here is the Login.html link which is
> throwing no
> form exists error.
>
> https://www.dropbox.com/s/jkts0eogarfs03j/Log%20in%20.html?dl=0
>
> Please take a look and suggest what could be wrong when trying to
> sign in
> to this site.
>
> Also below content of auth-configuration section of httpclient-
> auth.xml
>
> ---
> <credentials authMethod="formAuth"
> loginUrl="https://qa.mysite.sitecorp.com/user/login"
> loginFormId="user-login-form"
> loginRedirect="false">
> <loginPostData>
> <field name="name"
> value="Crawler"/>
> <field name="pass"
> value="spid3r_us"/>
> </loginPostData>
> <additionalPostHeaders>
> <field name="User-Agent"
> value="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_3)
> AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100
> Safari/537.36"
> />
> </additionalPostHeaders>
> <removedFormFields>
> <field name="ctl00$MainContent$LoginUser$RememberMe"/>
> </removedFormFields>
> <loginCookie>
> <policy>BROWSER_COMPATIBILITY</policy>
> </loginCookie>
> </credentials>
>
>
>
>
> On Wed, Jul 3, 2019 at 10:22 AM Sebastian Nagel
> <wa...@googlemail.com.invalid> wrote:
>
> > Hi,
> >
> > the error message is quite clear:
> >
> > > 2019-07-02 10:36:59,202 DEBUG httpclient.HttpFormAuthentication -
> > > No form
> > > element found with 'id' = user-login-form, trying 'name'.
> > > 2019-07-02 10:36:59,205 DEBUG httpclient.HttpFormAuthentication -
> > > No form
> > > element found with 'name' = user-login-form
> >
> > But without access to the login page content, it's nearly
> > impossible to
> > determine
> > what's going wrong.
> >
> >
> > > I tried crawling the same url/login page using Selenium Chrome
> > > Drive and
> >
> > it
> > > does load and fill in the user id/pwd text boxes.
> >
> > Sounds like the page HTML source looks different with Selenium.
> > Note that
> > the
> > protocol-httpclient does not modify the DOM tree via Javascript, it
> > is
> > derived
> > from the bare HTML only. That could be a reason why the form
> > element is
> > not found
> > while it works in a browser (emulation).
> >
> >
> > Best,
> > Sebastian
> >
Re: IllegalArgumentException: No form exists: user-login-form
Posted by Susheel Kumar <su...@gmail.com>.
Hello Sebastian,
Thanks for getting back. Here is the Login.html link which is throwing no
form exists error.
https://www.dropbox.com/s/jkts0eogarfs03j/Log%20in%20.html?dl=0
Please take a look and suggest what could be wrong when trying to sign in
to this site.
Also below content of auth-configuration section of httpclient-auth.xml
---
<credentials authMethod="formAuth"
loginUrl="https://qa.mysite.sitecorp.com/user/login"
loginFormId="user-login-form"
loginRedirect="false">
<loginPostData>
<field name="name"
value="Crawler"/>
<field name="pass"
value="spid3r_us"/>
</loginPostData>
<additionalPostHeaders>
<field name="User-Agent"
value="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_3)
AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36"
/>
</additionalPostHeaders>
<removedFormFields>
<field name="ctl00$MainContent$LoginUser$RememberMe"/>
</removedFormFields>
<loginCookie>
<policy>BROWSER_COMPATIBILITY</policy>
</loginCookie>
</credentials>
On Wed, Jul 3, 2019 at 10:22 AM Sebastian Nagel
<wa...@googlemail.com.invalid> wrote:
> Hi,
>
> the error message is quite clear:
>
> > 2019-07-02 10:36:59,202 DEBUG httpclient.HttpFormAuthentication - No form
> > element found with 'id' = user-login-form, trying 'name'.
> > 2019-07-02 10:36:59,205 DEBUG httpclient.HttpFormAuthentication - No form
> > element found with 'name' = user-login-form
>
> But without access to the login page content, it's nearly impossible to
> determine
> what's going wrong.
>
>
> > I tried crawling the same url/login page using Selenium Chrome Drive and
> it
> > does load and fill in the user id/pwd text boxes.
>
> Sounds like the page HTML source looks different with Selenium. Note that
> the
> protocol-httpclient does not modify the DOM tree via Javascript, it is
> derived
> from the bare HTML only. That could be a reason why the form element is
> not found
> while it works in a browser (emulation).
>
>
> Best,
> Sebastian
>
Re: IllegalArgumentException: No form exists: user-login-form
Posted by Sebastian Nagel <wa...@googlemail.com.INVALID>.
Hi,
the error message is quite clear:
> 2019-07-02 10:36:59,202 DEBUG httpclient.HttpFormAuthentication - No form
> element found with 'id' = user-login-form, trying 'name'.
> 2019-07-02 10:36:59,205 DEBUG httpclient.HttpFormAuthentication - No form
> element found with 'name' = user-login-form
But without access to the login page content, it's nearly impossible to determine
what's going wrong.
> I tried crawling the same url/login page using Selenium Chrome Drive and it
> does load and fill in the user id/pwd text boxes.
Sounds like the page HTML source looks different with Selenium. Note that the
protocol-httpclient does not modify the DOM tree via Javascript, it is derived
from the bare HTML only. That could be a reason why the form element is not found
while it works in a browser (emulation).
Best,
Sebastian
Re: IllegalArgumentException: No form exists: user-login-form
Posted by Susheel Kumar <su...@gmail.com>.
Any insight into this error?
On Tue, Jul 2, 2019 at 12:23 PM Susheel Kumar <su...@gmail.com> wrote:
> Hello Nutch Users,
>
> I am a first time Nutch user and been trying to crawl an intranet portal *https://pilot.mysite.sitecorp.com/user/login
> <https://pilot.mysite.sitecorp.com/user/login>* using Nutch 1.15 and I
> am always getting below "No form exists: user-login-form" error. I tried
> crawling other login page like https://urs.earthdata.nasa.gov/ and do not
> see such error but for this intranet site I am always getting this error.
>
> I tried crawling the same url/login page using Selenium Chrome Drive and
> it does load and fill in the user id/pwd text boxes.
>
> What could be wrong. How can i further troubleshoot this?
>
> Thanks in advance.
>
> 2019-07-02 10:36:59,152 DEBUG httpclient.HttpMethodBase - Resorting to
> protocol version default close connection policy
> 2019-07-02 10:36:59,153 DEBUG httpclient.HttpMethodBase - Should NOT close
> connection, using HTTP/1.1
> 2019-07-02 10:36:59,153 TRACE httpclient.HttpConnection - enter
> HttpConnection.isResponseAvailable()
> 2019-07-02 10:36:59,153 TRACE httpclient.HttpConnection - enter
> HttpConnection.releaseConnection()
> 2019-07-02 10:36:59,153 DEBUG httpclient.HttpConnection - Releasing
> connection back to connection manager.
> 2019-07-02 10:36:59,153 TRACE
> httpclient.MultiThreadedHttpConnectionManager - enter
> HttpConnectionManager.releaseConnection(HttpConnection)
> 2019-07-02 10:36:59,153 DEBUG
> httpclient.MultiThreadedHttpConnectionManager - Freeing connection,
> hostConfig=HostConfiguration[host=https://pilot.mysite.sitecorp.com]
> 2019-07-02 10:36:59,153 TRACE
> httpclient.MultiThreadedHttpConnectionManager - enter
> HttpConnectionManager.ConnectionPool.getHostPool(HostConfiguration)
> 2019-07-02 10:36:59,153 DEBUG util.IdleConnectionHandler - Adding
> connection at: 1562078219153
> 2019-07-02 10:36:59,153 DEBUG
> httpclient.MultiThreadedHttpConnectionManager - Notifying no-one, there are
> no waiting threads
> 2019-07-02 10:36:59,202 DEBUG httpclient.HttpFormAuthentication - No form
> element found with 'id' = user-login-form, trying 'name'.
> 2019-07-02 10:36:59,205 DEBUG httpclient.HttpFormAuthentication - No form
> element found with 'name' = user-login-form
> 2019-07-02 10:36:59,205 ERROR httpclient.Http - Failed to get protocol
> output
> java.lang.RuntimeException: java.lang.IllegalArgumentException: No form
> exists: user-login-form
> at
> org.apache.nutch.protocol.httpclient.Http.resolveCredentials(Http.java:500)
> at
> org.apache.nutch.protocol.httpclient.Http.getResponse(Http.java:177)
> at
> org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java:320)
> at
> org.apache.nutch.fetcher.FetcherThread.run(FetcherThread.java:343)
> Caused by: java.lang.IllegalArgumentException: No form exists:
> user-login-form
> at
> org.apache.nutch.protocol.httpclient.HttpFormAuthentication.getLoginFormParams(HttpFormAuthentication.java:219)
> at
> org.apache.nutch.protocol.httpclient.HttpFormAuthentication.login(HttpFormAuthentication.java:95)
> at
> org.apache.nutch.protocol.httpclient.Http.resolveCredentials(Http.java:498)
> ... 3 more
> 2019-07-02 10:36:59,209 INFO fetcher.FetcherThread - FetcherThread 41
> fetch of https://pilot.mysite.sitecorp.com/user/login failed with:
> java.lang.RuntimeException: java.lang.IllegalArgumentException: No form
> exists: user-login-form
> 2019-07-02 10:36:59,210 INFO fetcher.FetcherThread - FetcherThread 41 has
> no more work available
> 2019-07-02 10:36:59,210 INFO fetcher.FetcherThread - FetcherThread 41
> -finishing thread FetcherThread, activeThreads=0
> 2019-07-02 10:36:59,215 INFO mapreduce.Job - Job job_local487279790_0001
> running in uber mode : false
> 2019-07-02 10:36:59,216 INFO mapreduce.Job - map 0% reduce 0%
> 2019-07-02 10:36:59,635 INFO fetcher.Fetcher - -activeThreads=0,
> spinWaiting=0, fetchQueues.totalSize=0, fetchQueues.getQueueCount=0
> 2019-07-02 10:36:59,635 INFO fetcher.Fetcher - -activeThreads=0
> 2019-07-02 10:37:00,218 INFO mapreduce.Job - map 100% reduce 100%
> 2019-07-02 10:37:00,218 INFO mapreduce.Job - Job job_local487279790_0001
> completed successfully
>