You are viewing a plain text version of this content. The canonical link for it is here.
Posted to httpclient-users@hc.apache.org by jy...@aol.com on 2007/07/03 05:54:12 UTC

Re: How to "mimic a browser" for threaded web sites?

 It looks like I was way off base on this one.? For the moment, forget my hypothesis of multithreading.? This web site has something much more interesting that I did not understand at the time.? I used the methodology in your primer "ForAbsoluteBeginners," to do a study of this website.? This is what I did ... and what I found.

First, I set up a program to GET the Logon Page.? Here is the program I used.? As you can see, except for the url, it is exactly the sample program in the HttpClientTutorial.

import org.apache.commons.httpclient.*;
import org.apache.commons.httpclient.methods.*;
import org.apache.commons.httpclient.params.HttpMethodParams;

import java.io.*;

public class ConnectToSiteNew {
? 
?private static String url = "https://ais4.tiaa-cref.org/customerinquiry/accountHome.do";
? public static void main(String[] args) {
??? // Create an instance of HttpClient.
??? HttpClient client = new HttpClient();

??? // Create a method instance.

??? GetMethod method = new GetMethod(url);
??? // Provide custom retry handler if necessary
??? method.getParams().setParameter(HttpMethodParams.RETRY_HANDLER, 
??? ??? ??? new DefaultHttpMethodRetryHandler(3, false));

??? try {
????? // Execute the method.
????? int statusCode = client.executeMethod(method);

????? if (statusCode != HttpStatus.SC_OK) {
??????? System.err.println("Method failed: " + method.getStatusLine());
????? }

????? // Read the response body.
????? byte[] responseBody = method.getResponseBody();

????? // Deal with the response.
????? // Use caution: ensure correct character encoding and is not binary data
????? System.out.println(new String(responseBody));

??? } catch (HttpException e) {
????? System.err.println("Fatal protocol violation: " + e.getMessage());
????? e.printStackTrace();
??? } catch (IOException e) {
????? System.err.println("Fatal transport error: " + e.getMessage());
????? e.printStackTrace();
??? } finally {
????? // Release the connection.
????? method.releaseConnection();
??? ????? }? 
? }
}

The first hint that there is something unusual here is that the url appears to refer to a script.? You can run the above java program.? It works and retrieves a Logon page that looks the same as the one you would get with a browser.? The difference is that the java program skips the home page and goes directly to the "Logon page" (I tried this with a browser as well instead of going to the home page "tiaa-cref.org" and clicking the "logon" button.)? The unusual feature of the Logon Page is that it is different for each user.? Each user gets his/her own, custom generated logon form.? You can see that the program takes a while to execute, while it generates the Logon Page, but it does work.? I dumped the standard output (System.out) to a file so that I could examine it.

Next I did an analysis of the Logon form.? I searched it for <input .../> statements.? I found the two usual ones for entering the user id and password:

<input type="text" tabindex="1" name="user"? id="user" .../>
<input type="password" tabindex="2" name="password" .../>

I also found some statements that assign constant values to certain names:

<input type="hidden" name="DK" value="" />
<input type="hidden" name="SMAUTHREASON" value="0" />

But the interesting ones were the following three that are unique to my session:

<input type="hidden" name="TARGET" value="https://ais4.tiaa-cref.org/selfservices/secureresource/redirect.do?targetURL=https://ais4.tiaa-cref.org/customerinquiry/accountHome.do"/>

<input type="hidden" name="SMAGENTNAME" value="vAWNg3iV8aADFepETR44Ovi5r0zNV8p2k6u11LgIee9yVDlbNk3m1lHN1QOMpE3h" />

<input type="hidden" name="REALMOID" value="06-000ad955-678a-1334-9f02-83ab87ebff3f" />

Pressing the "submit" button on the logon form seems to submit the logon form to yet another script:

<input type="image" class="submit_button" src="../docs/images/login.png" alt="Log in" onclick='return submitLoginForm("https://ais4.tiaa-cref.org/forms/tiaacref.fcc","/selfservices/sso/login.do?command=validateForm" )'/>

What I would like to do is expand the above program to: 

1) GET the logon form (I have already done this)

2) "hold onto" the form while I insert the values for user id and password

3) submit the form and follow the redirects as usual.

How do I do 2) and 3)?

Jerry
















 

-----Original Message-----
From: Roland Weber <os...@dubioso.net>
To: HttpClient User Discussion <ht...@jakarta.apache.org>
Sent: Sun, 8 Apr 2007 1:27 pm
Subject: Re: How to "mimic a browser" for threaded web sites?










Hi Jerry,

> Sorry to take such a long time between requests for help. Before I make
> major changes to my application, how can I test to see whether I can still
> use the httpclient "simple connection" and mimic the sequential requests as
> you suggest or whether I really need a multithreaded connection?

The hard way: trial and error. I don't know of any spec that
*requires* a browser to open multiple connections, so I find
it hard to believe that any web application would rely on that.
Even if there are multiple windows, that doesn't mean that
more than one of them is executing a request at any one time.
Requests are most often generated by the user clicking some
link or button, and users don't click in multiple windows at
the same time.

Just analyse the web application as you used to, noting cases
where the returned page is displayed in a different window.
If it is plain HTML, that is done by the target="windowname"
attribute in links. Then try to run the sequence of requests
generated by multiple windows from a single thread.

hope that helps,
  Roland


---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: httpclient-user-help@jakarta.apache.org




 


________________________________________________________________________
AOL now offers free email to everyone.  Find out more about what's free from AOL at AOL.com.

Re: How to "mimic a browser" for threaded web sites?

Posted by Roland Weber <os...@dubioso.net>.
Hi Jerry,

this might become the longest-running thread in
the history of this mailing list :-)

> 1) GET the logon form (I have already done this)

Check.

> 2) "hold onto" the form while I insert the values for user id and password

There is nothing to be done to "hold onto" anything.
The server delivered the page and prepared the session.
If the client was a browser, the user might also take
several minutes to respond. The client does not keep
an open connection to send heartbeats or anything.

Just read the form, parse the HTML, then prepare and
submit the next request.

> 3) submit the form and follow the redirects as usual.

You will have to parse the HTML to extract the session
specific form values. I recommend a simple algorithm
based on standard Java regular expressions. Then cut
out the session specific values and prepare a POST
request with all required name/value pairs, as described
in the primer.

hope that helps,
  Roland


---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: httpclient-user-help@jakarta.apache.org