You are viewing a plain text version of this content. The canonical link for it is here.
Posted to httpclient-users@hc.apache.org by Roland Weber <RO...@de.ibm.com> on 2005/10/12 08:04:49 UTC

Re: [HttpClient] Number of open sockets increase on session bean redeployment

Hi Tom,

HttpClient is not a J2EE application nor an EJB.
If you use it from an EJB, initialization and cleanup
is your responsibility. The only one on this mailing
list that knows what your EJB code is doing is you.

I *suspect* that your EJB creates a new HttpClient
object on redeployment, and more importantly a
new MultiThreadedConnectionManager. The new
connection manager will create it's own connections
to manage, and knows nothing about the old one.
The connections in the old connection manager
will probably not be cleaned up until the old EJB
gets garbage collected.

hope that helps,
  Roland




"Tom Zaranek" <tz...@loyalty.com> 
11.10.2005 22:57
Please respond to
"HttpClient User Discussion"


To
<ht...@jakarta.apache.org>
cc

Subject
[HttpClient] Number of open sockets increase on session bean redeployment






I am using the MultiThreadedHttpConnectionManager, reuse HttpClient and
do a releaseConnection() on POST method complete on finally in a
try-catch-finally block from within a stateless session EJB.  When
reading the response sent by the post method, I
getResponseBodyAsString(). 

It appears that the number of sockets opens up to the
MAX_SOCKETS_PER_HOST_CONNECTIONS given high enough load.  When I
redeploy the application (ear file), however, the sockets stay open and
additional MAX_SOCKETS_PER_HOST_CONNECTIONS will be created under the
same load.  If the application server gets restarted, all of the socket
connections will be dropped.  Note that the connections initially opened
should stay open since they are set to persist.  But on redeployment I
would expect that the previously opened sockets would close. 

Can someone explain/give a solution to stop the increase of the number
of sockets after each redeployment?  It almost appears that the
Connection Manager does not get destroyed on application redepolyment
which I wonder that it makes sense.
Any help would be appreciated.

Tom

---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: httpclient-user-help@jakarta.apache.org




---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: httpclient-user-help@jakarta.apache.org


RE: How can I find certain words in a html page?

Posted by Graeme <co...@hotmail.com>.
Ok thanks ill should be able to search for those two things in the string
then get the substring of what is between them.

-----Original Message-----
From: Thom Hehl [mailto:thom@nowhereatall.com] 
Sent: 13 October 2005 11:33
To: HttpClient User Discussion
Subject: Re: How can I find certain words in a html page?

Start by looking at String.matches(). If that will meet your needs, it 
could save you a bit of work.

Owen Smith wrote:

>Since you have a pretty exact idea of what surrounds the data that
>you're looking for a bit of work with regular expressions (regexps)
>should be enough to extract the data you want.  There are a bunch of
>packages you can use to provide regexp functionality.  A bit of
>searching with google should be enough to get you started.
>
>HtH,
>Owen
>
>On 10/12/05, Graeme <co...@hotmail.com> wrote:
>  
>
>>I am going to be using HTTPCLIENT to get the source of a web page and I am
>>hoping to be able to extract certain information from that webpage. It
will
>>all be HTML and I am looking for all the information between these tags
>>
>>    
>>
><snipped HTML stuff>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org
>For additional commands, e-mail: httpclient-user-help@jakarta.apache.org
>
>
>  
>


---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: httpclient-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: httpclient-user-help@jakarta.apache.org


Re: How can I find certain words in a html page?

Posted by Thom Hehl <th...@nowhereatall.com>.
Start by looking at String.matches(). If that will meet your needs, it 
could save you a bit of work.

Owen Smith wrote:

>Since you have a pretty exact idea of what surrounds the data that
>you're looking for a bit of work with regular expressions (regexps)
>should be enough to extract the data you want.  There are a bunch of
>packages you can use to provide regexp functionality.  A bit of
>searching with google should be enough to get you started.
>
>HtH,
>Owen
>
>On 10/12/05, Graeme <co...@hotmail.com> wrote:
>  
>
>>I am going to be using HTTPCLIENT to get the source of a web page and I am
>>hoping to be able to extract certain information from that webpage. It will
>>all be HTML and I am looking for all the information between these tags
>>
>>    
>>
><snipped HTML stuff>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org
>For additional commands, e-mail: httpclient-user-help@jakarta.apache.org
>
>
>  
>


---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: httpclient-user-help@jakarta.apache.org


Re: How can I find certain words in a html page?

Posted by Owen Smith <en...@gmail.com>.
Since you have a pretty exact idea of what surrounds the data that
you're looking for a bit of work with regular expressions (regexps)
should be enough to extract the data you want.  There are a bunch of
packages you can use to provide regexp functionality.  A bit of
searching with google should be enough to get you started.

HtH,
Owen

On 10/12/05, Graeme <co...@hotmail.com> wrote:
> I am going to be using HTTPCLIENT to get the source of a web page and I am
> hoping to be able to extract certain information from that webpage. It will
> all be HTML and I am looking for all the information between these tags
>
<snipped HTML stuff>

---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: httpclient-user-help@jakarta.apache.org


How can I find certain words in a html page?

Posted by Graeme <co...@hotmail.com>.
I am going to be using HTTPCLIENT to get the source of a web page and I am
hoping to be able to extract certain information from that webpage. It will
all be HTML and I am looking for all the information between these tags

//... HTML Stuff here
</td>

	<td class="alt1">(Simple 2 digit number I need here)</td>
	
	
</tr><tr align="center">
//... More HTML Stuff after this as well
</td>

	<td class="alt1">(Simple 2 digit number I need here)</td>
	
	
</tr><tr align="center">
//... HTML Stuff after this as well
Ect.

I am thinking I am going to have to search through the
method.getResponseBody() for text that begins with </td> <td class="alt1">
and ends in </tr><tr align="center"> and get the data in the middle of them.

Although am I right in thinking I can't search through a line at a time? I
have to wait till the entire source comes in and then search through a
massive string?

Anyway once I have the data I want it put into a text file for the sake of
it which I can do. 
Here's the code so far 

import java.io.*;
import java.net.*;
import org.apache.commons.httpclient.*;
import org.apache.commons.httpclient.methods.*;
import org.apache.commons.httpclient.params.HttpMethodParams;

import java.io.*;

public class HttpClientTutorial {

  private static String url = "http://www.youngcoders.com/memberlist.php";

  public static void main(String[] args) {
    // Create an instance of HttpClient.
    HttpClient client = new HttpClient();

    // Create a method instance.
    GetMethod method = new GetMethod(url);

    // Provide custom retry handler is necessary
    method.getParams().setParameter(HttpMethodParams.RETRY_HANDLER,
    		new DefaultHttpMethodRetryHandler(3, false));

    try {
      // Execute the method.
      int statusCode = client.executeMethod(method);

      if (statusCode != HttpStatus.SC_OK) {
        System.err.println("Method failed: " + method.getStatusLine());
      }

      // Read the response body.
      byte[] responseBody = method.getResponseBody();

      // Deal with the response.
      // Use caution: ensure correct character encoding and is not binary
data

      File outFile = new File("age.html");  // name  file

      BufferedWriter writer = new BufferedWriter(new FileWriter(outFile));

      String line = new String(responseBody);
	  
	  writer.write(line);
	  writer.close();

      System.out.println(line);
      

    } catch (HttpException e) {
      System.err.println("Fatal protocol violation: " + e.getMessage());
      e.printStackTrace();
    } catch (IOException e) {
      System.err.println("Fatal transport error: " + e.getMessage());
      e.printStackTrace();
    } finally {
      // Release the connection.
      method.releaseConnection();
    }
  }
}

At the moment that just gets the entire web page and puts it in a .html file
but how do I just get certain bits from the page? 

Thanks for your time and if you don't understand anything just tell me and
Ill try and explain better.

---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: httpclient-user-help@jakarta.apache.org