You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jmeter-dev@jakarta.apache.org by js...@apache.org on 2004/02/06 01:05:54 UTC

cvs commit: jakarta-jmeter/src/protocol/http/org/apache/jmeter/protocol/http/parser HTMLParser.java

jsalvata    2004/02/05 16:05:54

  Modified:    bin/testfiles HTMLParserTestCase.all HTMLParserTestCase.set
               src/protocol/http/org/apache/jmeter/protocol/http/parser
                        HTMLParser.java
  Log:
  Added a comment on a potential performance problem...
  ... and trimmed 10 seconds off test execution time.
  
  Revision  Changes    Path
  1.3       +14 -14    jakarta-jmeter/bin/testfiles/HTMLParserTestCase.all
  
  Index: HTMLParserTestCase.all
  ===================================================================
  RCS file: /home/cvs/jakarta-jmeter/bin/testfiles/HTMLParserTestCase.all,v
  retrieving revision 1.2
  retrieving revision 1.3
  diff -u -r1.2 -r1.3
  --- HTMLParserTestCase.all	14 Jan 2004 22:05:49 -0000	1.2
  +++ HTMLParserTestCase.all	6 Feb 2004 00:05:54 -0000	1.3
  @@ -1,14 +1,14 @@
  -http://myhost/mydir/images/image-a.gif
  -http://myhost/mydir/images/image-b.gif
  -http://myhost/mydir/images/image-b.gif
  -http://myhost/mydir/images/image-c.gif
  -http://myhost/mydir/images/image-d.gif
  -http://myhost/mydir/images/image-e.gif
  -http://myhost/mydir/images/image-f.gif
  -http://myhost/mydir/images/image-a2.gif
  -http://myhost/mydir/images/image-b2.gif
  -http://myhost/mydir/images/image-c2.gif
  -http://myhost/mydir/images/image-d2.gif
  -http://myhost/mydir/images/image-d2.gif
  -http://myhost/mydir/images/image-e2.gif
  -http://myhost/mydir/images/image-f2.gif
  +http://localhost/mydir/images/image-a.gif
  +http://localhost/mydir/images/image-b.gif
  +http://localhost/mydir/images/image-b.gif
  +http://localhost/mydir/images/image-c.gif
  +http://localhost/mydir/images/image-d.gif
  +http://localhost/mydir/images/image-e.gif
  +http://localhost/mydir/images/image-f.gif
  +http://localhost/mydir/images/image-a2.gif
  +http://localhost/mydir/images/image-b2.gif
  +http://localhost/mydir/images/image-c2.gif
  +http://localhost/mydir/images/image-d2.gif
  +http://localhost/mydir/images/image-d2.gif
  +http://localhost/mydir/images/image-e2.gif
  +http://localhost/mydir/images/image-f2.gif
  
  
  
  1.3       +12 -12    jakarta-jmeter/bin/testfiles/HTMLParserTestCase.set
  
  Index: HTMLParserTestCase.set
  ===================================================================
  RCS file: /home/cvs/jakarta-jmeter/bin/testfiles/HTMLParserTestCase.set,v
  retrieving revision 1.2
  retrieving revision 1.3
  diff -u -r1.2 -r1.3
  --- HTMLParserTestCase.set	14 Jan 2004 22:05:49 -0000	1.2
  +++ HTMLParserTestCase.set	6 Feb 2004 00:05:54 -0000	1.3
  @@ -1,12 +1,12 @@
  -http://myhost/mydir/images/image-a.gif
  -http://myhost/mydir/images/image-b.gif
  -http://myhost/mydir/images/image-c.gif
  -http://myhost/mydir/images/image-d.gif
  -http://myhost/mydir/images/image-e.gif
  -http://myhost/mydir/images/image-f.gif
  -http://myhost/mydir/images/image-a2.gif
  -http://myhost/mydir/images/image-b2.gif
  -http://myhost/mydir/images/image-c2.gif
  -http://myhost/mydir/images/image-d2.gif
  -http://myhost/mydir/images/image-e2.gif
  -http://myhost/mydir/images/image-f2.gif
  +http://localhost/mydir/images/image-a.gif
  +http://localhost/mydir/images/image-b.gif
  +http://localhost/mydir/images/image-c.gif
  +http://localhost/mydir/images/image-d.gif
  +http://localhost/mydir/images/image-e.gif
  +http://localhost/mydir/images/image-f.gif
  +http://localhost/mydir/images/image-a2.gif
  +http://localhost/mydir/images/image-b2.gif
  +http://localhost/mydir/images/image-c2.gif
  +http://localhost/mydir/images/image-d2.gif
  +http://localhost/mydir/images/image-e2.gif
  +http://localhost/mydir/images/image-f2.gif
  
  
  
  1.17      +14 -4     jakarta-jmeter/src/protocol/http/org/apache/jmeter/protocol/http/parser/HTMLParser.java
  
  Index: HTMLParser.java
  ===================================================================
  RCS file: /home/cvs/jakarta-jmeter/src/protocol/http/org/apache/jmeter/protocol/http/parser/HTMLParser.java,v
  retrieving revision 1.16
  retrieving revision 1.17
  diff -u -r1.16 -r1.17
  --- HTMLParser.java	8 Jan 2004 13:54:16 -0000	1.16
  +++ HTMLParser.java	6 Feb 2004 00:05:54 -0000	1.17
  @@ -182,6 +182,16 @@
   			}
           	
   			return getEmbeddedResourceURLs(html, baseUrl,col);
  +            
  +            // An additional note on using HashSets to store URLs: I just
  +            // discovered that obtaining the hashCode of a java.net.URL implies
  +            // a domain-name resolution process. This means significant delays
  +            // can occur, even more so if the domain name is not resolvable.
  +            // Whether this can be a problem in practical situations I can't tell, but
  +            // thought I'd keep a note just in case...
  +            // BTW, note that using a Vector and removing duplicates via scan
  +            // would not help, since URL.equals requires name resolution too.
  +            // TODO: maybe change the API to return URL Strings instead of java.net.URLs?
           }
           
           // See whether we can use LinkedHashSet or not:
  @@ -294,13 +304,13 @@
           private static final TestData[] TESTS = new TestData[]{
           	new TestData(
           	             "testfiles/HTMLParserTestCase.html",
  -			             "http://myhost/mydir/myfile.html",
  +			             "http://localhost/mydir/myfile.html",
   			             "testfiles/HTMLParserTestCase.set",
   			              "testfiles/HTMLParserTestCase.all"
           	             ),
   			new TestData(
   			             "testfiles/HTMLParserTestCaseWithBaseHRef.html",
  -						 "http://myhost/mydir/myfile.html",
  +						 "http://localhost/mydir/myfile.html",
   						 "testfiles/HTMLParserTestCase.set",
   						  "testfiles/HTMLParserTestCase.all"
   						 ),
  @@ -318,7 +328,7 @@
   						 ),
               new TestData(
                            "testfiles/HTMLParserTestCaseWithComments.html",
  -                         "http://myhost/mydir/myfile.html",
  +                         "http://localhost/mydir/myfile.html",
                            "testfiles/HTMLParserTestCase.set",
                            "testfiles/HTMLParserTestCase.all"
                            ),
  
  
  

---------------------------------------------------------------------
To unsubscribe, e-mail: jmeter-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: jmeter-dev-help@jakarta.apache.org


Re: cvs commit: jakarta-jmeter/src/protocol/http/org/apache/jmeter/protocol/http/parser HTMLParser.java

Posted by peter lin <jm...@yahoo.com>.
that's an interesting piece of information. I
definitely wasn't aware of that. Great job.

peter

--- Jordi Salvat i Alabart <js...@atg.com> wrote:
> I was thinking about this, and it's actually a bug
> in java.net.URL: 
> since the advent of HTTP 1.1,
> http://foo.bar/index.html will very often 
> be different from http://zutano.mengano/index.html
> even if both names 
> point to the same IP.
> 
> So I searched the Java bug parade and found
> bug#4434494: 
>
http://developer.java.sun.com/developer/bugParade/bugs/4434494.html.
> In 
> the comments to that I found this:
> 
> �
> [...] to address URI parsing in general, we
> introduced a new class 
> called URI in Merlin (jdk1.4). People are encouraged
> to use URI for 
> parsing and URI comparison, and leave URL class for
> accessing the URI 
> itself, getting at the protocol handler, interacting
> with the protocol 
> etc. So, at present, we don't plan on changing the
> URL.equals/hashCode 
> behavior [...]
> �
> 
> -- 
> Salut,
> 
> Jordi.
> 
> En/na Jordi Salvat i Alabart ha escrit:
> > 
> > 
> > En/na sebb@apache.org ha escrit:
> > 
> >>
> >>>  +            // TODO: maybe change the API to
> return URL Strings 
> >>> instead of java.net.URLs?
> >>
> >>
> >>
> >> Since the HTTPSampler would need to re-create the
> URLs in order to use 
> >> them, perhaps we could use URL.toString() as the
> hash key,
> >> and store the URL as the value? Would require
> more storage, but no 
> >> need to recreate the URLs.
> >>
> > 
> > GREAT idea! The need to recreate the URLs was
> actually the reason for 
> > the question mark at the end of the phrase.
> > 
> >> Might be fun to create a Collection to handle
> this automatically - or 
> >> there may already be something suitable in
> Commons-Collections
> >> (which JMeter already uses - albeit by
> ListenerNotifier only!).
> >>
> > 
> > Fun yes. But probably an overkill.
> > 
> 
> 
>
---------------------------------------------------------------------
> To unsubscribe, e-mail:
> jmeter-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail:
> jmeter-dev-help@jakarta.apache.org
> 


__________________________________
Do you Yahoo!?
Yahoo! Finance: Get your refund fast by filing online.
http://taxes.yahoo.com/filing.html

---------------------------------------------------------------------
To unsubscribe, e-mail: jmeter-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: jmeter-dev-help@jakarta.apache.org


Re: cvs commit: jakarta-jmeter/src/protocol/http/org/apache/jmeter/protocol/http/parser HTMLParser.java

Posted by Jordi Salvat i Alabart <js...@atg.com>.
I was thinking about this, and it's actually a bug in java.net.URL: 
since the advent of HTTP 1.1, http://foo.bar/index.html will very often 
be different from http://zutano.mengano/index.html even if both names 
point to the same IP.

So I searched the Java bug parade and found bug#4434494: 
http://developer.java.sun.com/developer/bugParade/bugs/4434494.html. In 
the comments to that I found this:

«
[...] to address URI parsing in general, we introduced a new class 
called URI in Merlin (jdk1.4). People are encouraged to use URI for 
parsing and URI comparison, and leave URL class for accessing the URI 
itself, getting at the protocol handler, interacting with the protocol 
etc. So, at present, we don't plan on changing the URL.equals/hashCode 
behavior [...]
»

-- 
Salut,

Jordi.

En/na Jordi Salvat i Alabart ha escrit:
> 
> 
> En/na sebb@apache.org ha escrit:
> 
>>
>>>  +            // TODO: maybe change the API to return URL Strings 
>>> instead of java.net.URLs?
>>
>>
>>
>> Since the HTTPSampler would need to re-create the URLs in order to use 
>> them, perhaps we could use URL.toString() as the hash key,
>> and store the URL as the value? Would require more storage, but no 
>> need to recreate the URLs.
>>
> 
> GREAT idea! The need to recreate the URLs was actually the reason for 
> the question mark at the end of the phrase.
> 
>> Might be fun to create a Collection to handle this automatically - or 
>> there may already be something suitable in Commons-Collections
>> (which JMeter already uses - albeit by ListenerNotifier only!).
>>
> 
> Fun yes. But probably an overkill.
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: jmeter-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: jmeter-dev-help@jakarta.apache.org


Re: cvs commit: jakarta-jmeter/src/protocol/http/org/apache/jmeter/protocol/http/parser HTMLParser.java

Posted by Jordi Salvat i Alabart <js...@atg.com>.

En/na sebb@apache.org ha escrit:
> 
>>  +            // TODO: maybe change the API to return URL Strings instead of java.net.URLs?
> 
> 
> Since the HTTPSampler would need to re-create the URLs in order to use them, perhaps we could use URL.toString() as the hash key,
> and store the URL as the value? Would require more storage, but no need to recreate the URLs.
> 

GREAT idea! The need to recreate the URLs was actually the reason for 
the question mark at the end of the phrase.

> Might be fun to create a Collection to handle this automatically - or there may already be something suitable in Commons-Collections
> (which JMeter already uses - albeit by ListenerNotifier only!).
> 

Fun yes. But probably an overkill.

-- 
Salut,

Jordi.


---------------------------------------------------------------------
To unsubscribe, e-mail: jmeter-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: jmeter-dev-help@jakarta.apache.org


Re: cvs commit: jakarta-jmeter/src/protocol/http/org/apache/jmeter/protocol/http/parser HTMLParser.java

Posted by se...@apache.org.
----- Original Message ----- 
From: <js...@apache.org>
To: <ja...@apache.org>
Sent: Friday, February 06, 2004 12:05 AM
Subject: cvs commit: jakarta-jmeter/src/protocol/http/org/apache/jmeter/protocol/http/parser HTMLParser.java


> jsalvata    2004/02/05 16:05:54
>
>   Modified:    bin/testfiles HTMLParserTestCase.all HTMLParserTestCase.set
>                src/protocol/http/org/apache/jmeter/protocol/http/parser
>                         HTMLParser.java
>   Log:
>   Added a comment on a potential performance problem...
>   ... and trimmed 10 seconds off test execution time.

Excellent!

>   +            // An additional note on using HashSets to store URLs: I just
>   +            // discovered that obtaining the hashCode of a java.net.URL implies
>   +            // a domain-name resolution process. This means significant delays
>   +            // can occur, even more so if the domain name is not resolvable.
>   +            // Whether this can be a problem in practical situations I can't tell, but
>   +            // thought I'd keep a note just in case...


Hopefully the host-names would already be cached, so that resolving the name would not take long - and if not cached, hashing the
URL should cache the host name ready for HTTPSampler. Not sure if non-existent host names are cached; if not, then they would incur
a double penalty.

>   +            // BTW, note that using a Vector and removing duplicates via scan
>   +            // would not help, since URL.equals requires name resolution too.

[It looks like Mr Gosling was responsible for the decision to compare hosts using their IP addresses!]

Equal objects must have equal hashcodes, so if one uses the IP address, the other must as well.

>   +            // TODO: maybe change the API to return URL Strings instead of java.net.URLs?

Since the HTTPSampler would need to re-create the URLs in order to use them, perhaps we could use URL.toString() as the hash key,
and store the URL as the value? Would require more storage, but no need to recreate the URLs.

Might be fun to create a Collection to handle this automatically - or there may already be something suitable in Commons-Collections
(which JMeter already uses - albeit by ListenerNotifier only!).

S.


---------------------------------------------------------------------
To unsubscribe, e-mail: jmeter-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: jmeter-dev-help@jakarta.apache.org