You are viewing a plain text version of this content. The canonical link for it is here.

Posted to jmeter-dev@jakarta.apache.org by Jordi Salvat i Alabart <js...@atg.com> on 2003/11/23 04:14:48 UTC

HTTPSamplerFull performance: HTMLParser vs. Regexp-based

Hi.

I've finally found some time to test the performance of the 
HTTPSamplerFull implementation currently in CVS (developped by Peter Lin 
using HTMLParser) against the implementation I sent a while ago to the 
list (developped by me using Regexps). [Remember: the objective is not 
to decide which is best, but whether it's worth having both available to 
script developers].

The results are not conclusive, but they prove that the issue deserves 
further analysis:

1/ On the example I've been using, the Regexp-based implementation was 
more accurate than the HTMLParser-based one. This is very surprising to 
me, since I expected the Regexp-based implementation to be generally 
less accurate. I'll need some help on this one. More details later.

2/ On the example I've been using, the Regexp-based implementation was 
at least 7 times faster than the HTTPParser-based one. A quick look at 
the code suggests that the HTML Parser is being called 5 times (one for 
each tag of interest: img, applet, input, body, table). Am I correct? 
The regexp-based implementation only scans through the HTML once. This 
could well explain most of the performance difference. Is there any way 
to recode the HTMLParser-based implementation to do the job in a single 
scan?

How to reproduce the test:
- Get Apache and JMeter running (I'm running both on the same box, which 
is probably a bad idea).
- Uncompress the attached test-httpsamplerfull.tgz in the Apache 
docroot. It contains a Yahoo home page saved using Mozilla 1.5. (A 
proper test would use several other samples).
- Run the attached script and look at the Rate in the Aggregate Report.

On my IBM T30 with Pentium 4 M @ 2.2 GHz, 1 GB RAM, with JDK 1.4.2_02, 
no fiddling with the java arguments (yes, that means I'm using -Xincgc, 
which is probably the worst possible choice) I'm getting around 1 
sample/second with the HTPMLParser-based sampler and around 7 
sample/second with the Regexp-based one.

In addition, the HTMLParser-based implementation is failing to download 
two images: powrdbyhp_blu_84x28_yahoo.gif (it is downloading the HTML 
page again instead) and 031121_l300.gif (it downloads nothing). I've 
used Mozilla's "Live HTTP Headers" to see what Mozilla does and it 
matches what the Regexp-based implementation is doing. I'd say there's a 
bug in the HTMLParser. Can someone familiar with it have a look? (Hi 
Peter!).

-- 
Salut,

Jordi.

Re: HTTPSamplerFull performance: HTMLParser vs. Regexp-based

Posted by Jordi Salvat i Alabart <js...@atg.com>.

Sorry for replying to myself...

En/na Jordi Salvat i Alabart ha escrit:
> [...] the HTML Parser is being called 5 times (one for 
> each tag of interest: img, applet, input, body, table). Am I correct? 

No, I wasn't. I got confused by the JTidy implementation, which seems to 
be going through the DOM nodes (not the HTML text) five times.

Which is a pity, because it means I no longer see any obvious way to 
improve the performance of the HTMLParser implementation.

-- 
Salut,

Jordi.


---------------------------------------------------------------------
To unsubscribe, e-mail: jmeter-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: jmeter-dev-help@jakarta.apache.org

Re: HTTPSamplerFull performance: HTMLParser vs. Regexp-based

Posted by peter lin <jm...@yahoo.com>.

hi jordi,

I couldn't download the attachment you added to the
bug. can you send it to me directly and i'll try to
get to it next month after the holidays.

thanks.


peter

--- Jordi Salvat i Alabart <js...@atg.com> wrote:
> No, it doesn't. JTidy works well.
> 
> I'm suspecting your guess is wrong... :-)
> 
> -- 
> Salut,
> 
> Jordi.
> 
> En/na peter lin ha escrit:
> > can you verify if the old JTidy implementation
> > contains the same bug?
> > 
> > I'm going to guess it's how I'm using htmlparser.
> > 
> > peter
> > 
> > 
> > --- Jordi Salvat i Alabart <js...@atg.com>
> wrote:
> > 
> >>Responding to myself again...
> >>
> >>I've been running some more tests with JVM
> arguments
> >>that I believe more 
> >>sensible, namely:
> >>
> >>-Xms256m -Xmx256m -XX:NewSize=64m
> -XX:MaxNewSize=64m
> >>
> >>-XX:MaxLiveObjectEvacuationRatio=40
> >>-XX:SurvivorRatio=8
> >>
> >>With this, the performance difference has almost
> >>disappeared: I'm 
> >>getting ca. 12 sample/second with the htmlparser,
> 15
> >>sample/second with 
> >>the regexp approach. The htmlparser solution
> >>generates about 5 times 
> >>more garbage than the regexp solution -- which
> >>explains why the results 
> >>were so tremendously different using -Xincgc.
> >>
> >>In this situation, I don't believe it's worth
> >>providing users with the 
> >>ability to choose which parser they want. I won't
> >>remove them now, but I 
> >>believe HtmlParser is the best choice,... once
> we'll
> >>have managed to 
> >>clean the outstanding bugs.
> >>
> >>The bugs I mentioned before (failure to parse a
> >>couple of image URLs) 
> >>still hold. I'll file them now.
> >>
> >>-- 
> >>Salut,
> >>
> >>Jordi.
> >>
> >>En/na Jordi Salvat i Alabart ha escrit:
> >>
> >>>Hi.
> >>>
> >>>I've finally found some time to test the
> >>
> >>performance of the 
> >>
> >>>HTTPSamplerFull implementation currently in CVS
> >>
> >>(developped by Peter Lin 
> >>
> >>>using HTMLParser) against the implementation I
> >>
> >>sent a while ago to the 
> >>
> >>>list (developped by me using Regexps). [Remember:
> >>
> >>the objective is not 
> >>
> >>>to decide which is best, but whether it's worth
> >>
> >>having both available to 
> >>
> >>>script developers].
> >>>
> >>>The results are not conclusive, but they prove
> >>
> >>that the issue deserves 
> >>
> >>>further analysis:
> >>>
> >>>1/ On the example I've been using, the
> >>
> >>Regexp-based implementation was 
> >>
> >>>more accurate than the HTMLParser-based one. This
> >>
> >>is very surprising to 
> >>
> >>>me, since I expected the Regexp-based
> >>
> >>implementation to be generally 
> >>
> >>>less accurate. I'll need some help on this one.
> >>
> >>More details later.
> >>
> >>>2/ On the example I've been using, the
> >>
> >>Regexp-based implementation was 
> >>
> >>>at least 7 times faster than the HTTPParser-based
> >>
> >>one. A quick look at 
> >>
> >>>the code suggests that the HTML Parser is being
> >>
> >>called 5 times (one for 
> >>
> >>>each tag of interest: img, applet, input, body,
> >>
> >>table). Am I correct? 
> >>
> >>>The regexp-based implementation only scans
> through
> >>
> >>the HTML once. This 
> >>
> >>>could well explain most of the performance
> >>
> >>difference. Is there any way 
> >>
> >>>to recode the HTMLParser-based implementation to
> >>
> >>do the job in a single 
> >>
> >>>scan?
> >>>
> >>>How to reproduce the test:
> >>>- Get Apache and JMeter running (I'm running both
> >>
> >>on the same box, which 
> >>
> >>>is probably a bad idea).
> >>>- Uncompress the attached
> test-httpsamplerfull.tgz
> >>
> >>in the Apache 
> >>
> >>>docroot. It contains a Yahoo home page saved
> using
> >>
> >>Mozilla 1.5. (A 
> >>
> >>>proper test would use several other samples).
> >>>- Run the attached script and look at the Rate in
> >>
> >>the Aggregate Report.
> >>
> >>>On my IBM T30 with Pentium 4 M @ 2.2 GHz, 1 GB
> >>
> >>RAM, with JDK 1.4.2_02, 
> >>
> >>>no fiddling with the java arguments (yes, that
> >>
> >>means I'm using -Xincgc, 
> >>
> >>>which is probably the worst possible choice) I'm
> >>
> >>getting around 1 
> >>
> >>>sample/second with the HTPMLParser-based sampler
> >>
> >>and around 7 
> >>
> >>>sample/second with the Regexp-based one.
> >>>
> >>>In addition, the HTMLParser-based implementation
> >>
> >>is failing to download 
> >>
> >>>two images: powrdbyhp_blu_84x28_yahoo.gif (it is
> >>
> >>downloading the HTML 
> >>
> >>>page again instead) and 031121_l300.gif (it
> >>
> >>downloads nothing). I've 
> >>
> >>>used Mozilla's "Live HTTP Headers" to see what
> >>
> >>Mozilla does and it 
> >>
> >>>matches what the Regexp-based implementation is
> >>
> >>doing. I'd say there's a 
> >>
> 
=== message truncated ===


__________________________________
Do you Yahoo!?
Free Pop-Up Blocker - Get it now
http://companion.yahoo.com/

---------------------------------------------------------------------
To unsubscribe, e-mail: jmeter-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: jmeter-dev-help@jakarta.apache.org

Re: HTTPSamplerFull performance: HTMLParser vs. Regexp-based

Posted by Jordi Salvat i Alabart <js...@atg.com>.

No, it doesn't. JTidy works well.

I'm suspecting your guess is wrong... :-)

-- 
Salut,

Jordi.

En/na peter lin ha escrit:
> can you verify if the old JTidy implementation
> contains the same bug?
> 
> I'm going to guess it's how I'm using htmlparser.
> 
> peter
> 
> 
> --- Jordi Salvat i Alabart <js...@atg.com> wrote:
> 
>>Responding to myself again...
>>
>>I've been running some more tests with JVM arguments
>>that I believe more 
>>sensible, namely:
>>
>>-Xms256m -Xmx256m -XX:NewSize=64m -XX:MaxNewSize=64m
>>
>>-XX:MaxLiveObjectEvacuationRatio=40
>>-XX:SurvivorRatio=8
>>
>>With this, the performance difference has almost
>>disappeared: I'm 
>>getting ca. 12 sample/second with the htmlparser, 15
>>sample/second with 
>>the regexp approach. The htmlparser solution
>>generates about 5 times 
>>more garbage than the regexp solution -- which
>>explains why the results 
>>were so tremendously different using -Xincgc.
>>
>>In this situation, I don't believe it's worth
>>providing users with the 
>>ability to choose which parser they want. I won't
>>remove them now, but I 
>>believe HtmlParser is the best choice,... once we'll
>>have managed to 
>>clean the outstanding bugs.
>>
>>The bugs I mentioned before (failure to parse a
>>couple of image URLs) 
>>still hold. I'll file them now.
>>
>>-- 
>>Salut,
>>
>>Jordi.
>>
>>En/na Jordi Salvat i Alabart ha escrit:
>>
>>>Hi.
>>>
>>>I've finally found some time to test the
>>
>>performance of the 
>>
>>>HTTPSamplerFull implementation currently in CVS
>>
>>(developped by Peter Lin 
>>
>>>using HTMLParser) against the implementation I
>>
>>sent a while ago to the 
>>
>>>list (developped by me using Regexps). [Remember:
>>
>>the objective is not 
>>
>>>to decide which is best, but whether it's worth
>>
>>having both available to 
>>
>>>script developers].
>>>
>>>The results are not conclusive, but they prove
>>
>>that the issue deserves 
>>
>>>further analysis:
>>>
>>>1/ On the example I've been using, the
>>
>>Regexp-based implementation was 
>>
>>>more accurate than the HTMLParser-based one. This
>>
>>is very surprising to 
>>
>>>me, since I expected the Regexp-based
>>
>>implementation to be generally 
>>
>>>less accurate. I'll need some help on this one.
>>
>>More details later.
>>
>>>2/ On the example I've been using, the
>>
>>Regexp-based implementation was 
>>
>>>at least 7 times faster than the HTTPParser-based
>>
>>one. A quick look at 
>>
>>>the code suggests that the HTML Parser is being
>>
>>called 5 times (one for 
>>
>>>each tag of interest: img, applet, input, body,
>>
>>table). Am I correct? 
>>
>>>The regexp-based implementation only scans through
>>
>>the HTML once. This 
>>
>>>could well explain most of the performance
>>
>>difference. Is there any way 
>>
>>>to recode the HTMLParser-based implementation to
>>
>>do the job in a single 
>>
>>>scan?
>>>
>>>How to reproduce the test:
>>>- Get Apache and JMeter running (I'm running both
>>
>>on the same box, which 
>>
>>>is probably a bad idea).
>>>- Uncompress the attached test-httpsamplerfull.tgz
>>
>>in the Apache 
>>
>>>docroot. It contains a Yahoo home page saved using
>>
>>Mozilla 1.5. (A 
>>
>>>proper test would use several other samples).
>>>- Run the attached script and look at the Rate in
>>
>>the Aggregate Report.
>>
>>>On my IBM T30 with Pentium 4 M @ 2.2 GHz, 1 GB
>>
>>RAM, with JDK 1.4.2_02, 
>>
>>>no fiddling with the java arguments (yes, that
>>
>>means I'm using -Xincgc, 
>>
>>>which is probably the worst possible choice) I'm
>>
>>getting around 1 
>>
>>>sample/second with the HTPMLParser-based sampler
>>
>>and around 7 
>>
>>>sample/second with the Regexp-based one.
>>>
>>>In addition, the HTMLParser-based implementation
>>
>>is failing to download 
>>
>>>two images: powrdbyhp_blu_84x28_yahoo.gif (it is
>>
>>downloading the HTML 
>>
>>>page again instead) and 031121_l300.gif (it
>>
>>downloads nothing). I've 
>>
>>>used Mozilla's "Live HTTP Headers" to see what
>>
>>Mozilla does and it 
>>
>>>matches what the Regexp-based implementation is
>>
>>doing. I'd say there's a 
>>
>>>bug in the HTMLParser. Can someone familiar with
>>
>>it have a look? (Hi 
>>
>>>Peter!).
>>>
>>>
>>>
>>
> ------------------------------------------------------------------------
> 
>>><?xml version="1.0" encoding="UTF-8"?>
>>><node>
>>><testelement
>>
>>class="org.apache.jmeter.testelement.TestPlan">
>>
>>><testelement
>>
>>class="org.apache.jmeter.config.Arguments"
>>name="TestPlan.user_defined_variables">
>>
>>><property xml:space="preserve"
>>
> propType="org.apache.jmeter.testelement.property.StringProperty"
> 
> name="TestElement.gui_class">org.apache.jmeter.config.gui.ArgumentsPanel</property>
> 
>>><property xml:space="preserve"
>>
> propType="org.apache.jmeter.testelement.property.StringProperty"
> 
> name="TestElement.test_class">org.apache.jmeter.config.Arguments</property>
> 
>>><collection class="java.util.ArrayList"
>>
> propType="org.apache.jmeter.testelement.property.CollectionProperty"
> 
>>name="Arguments.arguments"/>
>>
>>><property xml:space="preserve"
>>
> propType="org.apache.jmeter.testelement.property.StringProperty"
> 
>>name="TestElement.name">Argument List</property>
>>
>>><property xml:space="preserve"
>>
> propType="org.apache.jmeter.testelement.property.BooleanProperty"
> 
>>name="TestElement.enabled">true</property>
>>
>>></testelement>
>>><property xml:space="preserve"
>>
> propType="org.apache.jmeter.testelement.property.StringProperty"
> 
> name="TestElement.gui_class">org.apache.jmeter.control.gui.TestPlanGui</property>
> 
>>><collection class="java.util.LinkedList"
>>
> propType="org.apache.jmeter.testelement.property.CollectionProperty"
> 
>>name="TestPlan.thread_groups"/>
>>
>>><property xml:space="preserve"
>>
> propType="org.apache.jmeter.testelement.property.StringProperty"
> 
> name="TestElement.test_class">org.apache.jmeter.testelement.TestPlan</property>
> 
>>><property xml:space="preserve"
>>
> propType="org.apache.jmeter.testelement.property.BooleanProperty"
> 
> name="TestPlan.serialize_threadgroups">false</property>
> 
>>><property xml:space="preserve"
>>
> propType="org.apache.jmeter.testelement.property.StringProperty"
> 
>>name="TestElement.name">Test Plan</property>
>>
>>><property xml:space="preserve"
>>
> propType="org.apache.jmeter.testelement.property.BooleanProperty"
> 
>>name="TestElement.enabled">true</property>
>>
>>><property xml:space="preserve"
>>
> propType="org.apache.jmeter.testelement.property.BooleanProperty"
> 
>>name="TestPlan.functional_mode">false</property>
>>
>>></testelement>
>>><node>
>>><testelement
>>
>>class="org.apache.jmeter.threads.ThreadGroup">
>>
>>><property xml:space="preserve"
>>
> propType="org.apache.jmeter.testelement.property.StringProperty"
> 
> name="TestElement.gui_class">org.apache.jmeter.threads.gui.ThreadGroupGui</property>
> 
>>><property xml:space="preserve"
>>
> propType="org.apache.jmeter.testelement.property.LongProperty"
> 
>>name="ThreadGroup.start_time">0</property>
>>
>>><property xml:space="preserve"
>>
> propType="org.apache.jmeter.testelement.property.StringProperty"
> 
> name="TestElement.test_class">org.apache.jmeter.threads.ThreadGroup</property>
> 
>>><testelement
>>
>>class="org.apache.jmeter.control.LoopController"
>>name="ThreadGroup.main_controller">
>>
>>><property xml:space="preserve"
>>
> propType="org.apache.jmeter.testelement.property.StringProperty"
> 
> name="TestElement.gui_class">org.apache.jmeter.control.gui.LoopControlPanel</property>
> 
>>><property xml:space="preserve"
>>
> propType="org.apache.jmeter.testelement.property.IntegerProperty"
> 
>>name="LoopController.loops">-1</property>
>>
>>><property xml:space="preserve"
>>
> propType="org.apache.jmeter.testelement.property.StringProperty"
> 
> name="TestElement.test_class">org.apache.jmeter.control.LoopController</property>
> 
>>><property xml:space="preserve"
>>
> propType="org.apache.jmeter.testelement.property.StringProperty"
> 
>>name="TestElement.name">Loop Controller</property>
>>
>>><property xml:space="preserve"
>>
> propType="org.apache.jmeter.testelement.property.BooleanProperty"
> 
>>name="TestElement.enabled">true</property>
>>
>>><property xml:space="preserve"
>>
> propType="org.apache.jmeter.testelement.property.BooleanProperty"
> 
> name="LoopController.continue_forever">false</property>
> 
>>></testelement>
>>
> === message truncated ===
> 
> 
> __________________________________
> Do you Yahoo!?
> Free Pop-Up Blocker - Get it now
> http://companion.yahoo.com/
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: jmeter-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: jmeter-dev-help@jakarta.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: jmeter-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: jmeter-dev-help@jakarta.apache.org

Re: HTTPSamplerFull performance: HTMLParser vs. Regexp-based

Posted by peter lin <jm...@yahoo.com>.

can you verify if the old JTidy implementation
contains the same bug?

I'm going to guess it's how I'm using htmlparser.

peter


--- Jordi Salvat i Alabart <js...@atg.com> wrote:
> Responding to myself again...
> 
> I've been running some more tests with JVM arguments
> that I believe more 
> sensible, namely:
> 
> -Xms256m -Xmx256m -XX:NewSize=64m -XX:MaxNewSize=64m
> 
> -XX:MaxLiveObjectEvacuationRatio=40
> -XX:SurvivorRatio=8
> 
> With this, the performance difference has almost
> disappeared: I'm 
> getting ca. 12 sample/second with the htmlparser, 15
> sample/second with 
> the regexp approach. The htmlparser solution
> generates about 5 times 
> more garbage than the regexp solution -- which
> explains why the results 
> were so tremendously different using -Xincgc.
> 
> In this situation, I don't believe it's worth
> providing users with the 
> ability to choose which parser they want. I won't
> remove them now, but I 
> believe HtmlParser is the best choice,... once we'll
> have managed to 
> clean the outstanding bugs.
> 
> The bugs I mentioned before (failure to parse a
> couple of image URLs) 
> still hold. I'll file them now.
> 
> -- 
> Salut,
> 
> Jordi.
> 
> En/na Jordi Salvat i Alabart ha escrit:
> > Hi.
> > 
> > I've finally found some time to test the
> performance of the 
> > HTTPSamplerFull implementation currently in CVS
> (developped by Peter Lin 
> > using HTMLParser) against the implementation I
> sent a while ago to the 
> > list (developped by me using Regexps). [Remember:
> the objective is not 
> > to decide which is best, but whether it's worth
> having both available to 
> > script developers].
> > 
> > The results are not conclusive, but they prove
> that the issue deserves 
> > further analysis:
> > 
> > 1/ On the example I've been using, the
> Regexp-based implementation was 
> > more accurate than the HTMLParser-based one. This
> is very surprising to 
> > me, since I expected the Regexp-based
> implementation to be generally 
> > less accurate. I'll need some help on this one.
> More details later.
> > 
> > 2/ On the example I've been using, the
> Regexp-based implementation was 
> > at least 7 times faster than the HTTPParser-based
> one. A quick look at 
> > the code suggests that the HTML Parser is being
> called 5 times (one for 
> > each tag of interest: img, applet, input, body,
> table). Am I correct? 
> > The regexp-based implementation only scans through
> the HTML once. This 
> > could well explain most of the performance
> difference. Is there any way 
> > to recode the HTMLParser-based implementation to
> do the job in a single 
> > scan?
> > 
> > How to reproduce the test:
> > - Get Apache and JMeter running (I'm running both
> on the same box, which 
> > is probably a bad idea).
> > - Uncompress the attached test-httpsamplerfull.tgz
> in the Apache 
> > docroot. It contains a Yahoo home page saved using
> Mozilla 1.5. (A 
> > proper test would use several other samples).
> > - Run the attached script and look at the Rate in
> the Aggregate Report.
> > 
> > On my IBM T30 with Pentium 4 M @ 2.2 GHz, 1 GB
> RAM, with JDK 1.4.2_02, 
> > no fiddling with the java arguments (yes, that
> means I'm using -Xincgc, 
> > which is probably the worst possible choice) I'm
> getting around 1 
> > sample/second with the HTPMLParser-based sampler
> and around 7 
> > sample/second with the Regexp-based one.
> > 
> > In addition, the HTMLParser-based implementation
> is failing to download 
> > two images: powrdbyhp_blu_84x28_yahoo.gif (it is
> downloading the HTML 
> > page again instead) and 031121_l300.gif (it
> downloads nothing). I've 
> > used Mozilla's "Live HTTP Headers" to see what
> Mozilla does and it 
> > matches what the Regexp-based implementation is
> doing. I'd say there's a 
> > bug in the HTMLParser. Can someone familiar with
> it have a look? (Hi 
> > Peter!).
> > 
> > 
> >
>
------------------------------------------------------------------------
> > 
> > <?xml version="1.0" encoding="UTF-8"?>
> > <node>
> > <testelement
> class="org.apache.jmeter.testelement.TestPlan">
> > <testelement
> class="org.apache.jmeter.config.Arguments"
> name="TestPlan.user_defined_variables">
> > <property xml:space="preserve"
>
propType="org.apache.jmeter.testelement.property.StringProperty"
>
name="TestElement.gui_class">org.apache.jmeter.config.gui.ArgumentsPanel</property>
> > <property xml:space="preserve"
>
propType="org.apache.jmeter.testelement.property.StringProperty"
>
name="TestElement.test_class">org.apache.jmeter.config.Arguments</property>
> > <collection class="java.util.ArrayList"
>
propType="org.apache.jmeter.testelement.property.CollectionProperty"
> name="Arguments.arguments"/>
> > <property xml:space="preserve"
>
propType="org.apache.jmeter.testelement.property.StringProperty"
> name="TestElement.name">Argument List</property>
> > <property xml:space="preserve"
>
propType="org.apache.jmeter.testelement.property.BooleanProperty"
> name="TestElement.enabled">true</property>
> > </testelement>
> > <property xml:space="preserve"
>
propType="org.apache.jmeter.testelement.property.StringProperty"
>
name="TestElement.gui_class">org.apache.jmeter.control.gui.TestPlanGui</property>
> > <collection class="java.util.LinkedList"
>
propType="org.apache.jmeter.testelement.property.CollectionProperty"
> name="TestPlan.thread_groups"/>
> > <property xml:space="preserve"
>
propType="org.apache.jmeter.testelement.property.StringProperty"
>
name="TestElement.test_class">org.apache.jmeter.testelement.TestPlan</property>
> > <property xml:space="preserve"
>
propType="org.apache.jmeter.testelement.property.BooleanProperty"
>
name="TestPlan.serialize_threadgroups">false</property>
> > <property xml:space="preserve"
>
propType="org.apache.jmeter.testelement.property.StringProperty"
> name="TestElement.name">Test Plan</property>
> > <property xml:space="preserve"
>
propType="org.apache.jmeter.testelement.property.BooleanProperty"
> name="TestElement.enabled">true</property>
> > <property xml:space="preserve"
>
propType="org.apache.jmeter.testelement.property.BooleanProperty"
> name="TestPlan.functional_mode">false</property>
> > </testelement>
> > <node>
> > <testelement
> class="org.apache.jmeter.threads.ThreadGroup">
> > <property xml:space="preserve"
>
propType="org.apache.jmeter.testelement.property.StringProperty"
>
name="TestElement.gui_class">org.apache.jmeter.threads.gui.ThreadGroupGui</property>
> > <property xml:space="preserve"
>
propType="org.apache.jmeter.testelement.property.LongProperty"
> name="ThreadGroup.start_time">0</property>
> > <property xml:space="preserve"
>
propType="org.apache.jmeter.testelement.property.StringProperty"
>
name="TestElement.test_class">org.apache.jmeter.threads.ThreadGroup</property>
> > <testelement
> class="org.apache.jmeter.control.LoopController"
> name="ThreadGroup.main_controller">
> > <property xml:space="preserve"
>
propType="org.apache.jmeter.testelement.property.StringProperty"
>
name="TestElement.gui_class">org.apache.jmeter.control.gui.LoopControlPanel</property>
> > <property xml:space="preserve"
>
propType="org.apache.jmeter.testelement.property.IntegerProperty"
> name="LoopController.loops">-1</property>
> > <property xml:space="preserve"
>
propType="org.apache.jmeter.testelement.property.StringProperty"
>
name="TestElement.test_class">org.apache.jmeter.control.LoopController</property>
> > <property xml:space="preserve"
>
propType="org.apache.jmeter.testelement.property.StringProperty"
> name="TestElement.name">Loop Controller</property>
> > <property xml:space="preserve"
>
propType="org.apache.jmeter.testelement.property.BooleanProperty"
> name="TestElement.enabled">true</property>
> > <property xml:space="preserve"
>
propType="org.apache.jmeter.testelement.property.BooleanProperty"
>
name="LoopController.continue_forever">false</property>
> > </testelement>
> 
=== message truncated ===


__________________________________
Do you Yahoo!?
Free Pop-Up Blocker - Get it now
http://companion.yahoo.com/

---------------------------------------------------------------------
To unsubscribe, e-mail: jmeter-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: jmeter-dev-help@jakarta.apache.org

Re: HTTPSamplerFull performance: HTMLParser vs. Regexp-based

Posted by Jordi Salvat i Alabart <js...@atg.com>.

Responding to myself again...

I've been running some more tests with JVM arguments that I believe more 
sensible, namely:

-Xms256m -Xmx256m -XX:NewSize=64m -XX:MaxNewSize=64m 
-XX:MaxLiveObjectEvacuationRatio=40 -XX:SurvivorRatio=8

With this, the performance difference has almost disappeared: I'm 
getting ca. 12 sample/second with the htmlparser, 15 sample/second with 
the regexp approach. The htmlparser solution generates about 5 times 
more garbage than the regexp solution -- which explains why the results 
were so tremendously different using -Xincgc.

In this situation, I don't believe it's worth providing users with the 
ability to choose which parser they want. I won't remove them now, but I 
believe HtmlParser is the best choice,... once we'll have managed to 
clean the outstanding bugs.

The bugs I mentioned before (failure to parse a couple of image URLs) 
still hold. I'll file them now.

-- 
Salut,

Jordi.

En/na Jordi Salvat i Alabart ha escrit:
> Hi.
> 
> I've finally found some time to test the performance of the 
> HTTPSamplerFull implementation currently in CVS (developped by Peter Lin 
> using HTMLParser) against the implementation I sent a while ago to the 
> list (developped by me using Regexps). [Remember: the objective is not 
> to decide which is best, but whether it's worth having both available to 
> script developers].
> 
> The results are not conclusive, but they prove that the issue deserves 
> further analysis:
> 
> 1/ On the example I've been using, the Regexp-based implementation was 
> more accurate than the HTMLParser-based one. This is very surprising to 
> me, since I expected the Regexp-based implementation to be generally 
> less accurate. I'll need some help on this one. More details later.
> 
> 2/ On the example I've been using, the Regexp-based implementation was 
> at least 7 times faster than the HTTPParser-based one. A quick look at 
> the code suggests that the HTML Parser is being called 5 times (one for 
> each tag of interest: img, applet, input, body, table). Am I correct? 
> The regexp-based implementation only scans through the HTML once. This 
> could well explain most of the performance difference. Is there any way 
> to recode the HTMLParser-based implementation to do the job in a single 
> scan?
> 
> How to reproduce the test:
> - Get Apache and JMeter running (I'm running both on the same box, which 
> is probably a bad idea).
> - Uncompress the attached test-httpsamplerfull.tgz in the Apache 
> docroot. It contains a Yahoo home page saved using Mozilla 1.5. (A 
> proper test would use several other samples).
> - Run the attached script and look at the Rate in the Aggregate Report.
> 
> On my IBM T30 with Pentium 4 M @ 2.2 GHz, 1 GB RAM, with JDK 1.4.2_02, 
> no fiddling with the java arguments (yes, that means I'm using -Xincgc, 
> which is probably the worst possible choice) I'm getting around 1 
> sample/second with the HTPMLParser-based sampler and around 7 
> sample/second with the Regexp-based one.
> 
> In addition, the HTMLParser-based implementation is failing to download 
> two images: powrdbyhp_blu_84x28_yahoo.gif (it is downloading the HTML 
> page again instead) and 031121_l300.gif (it downloads nothing). I've 
> used Mozilla's "Live HTTP Headers" to see what Mozilla does and it 
> matches what the Regexp-based implementation is doing. I'd say there's a 
> bug in the HTMLParser. Can someone familiar with it have a look? (Hi 
> Peter!).
> 
> 
> ------------------------------------------------------------------------
> 
> <?xml version="1.0" encoding="UTF-8"?>
> <node>
> <testelement class="org.apache.jmeter.testelement.TestPlan">
> <testelement class="org.apache.jmeter.config.Arguments" name="TestPlan.user_defined_variables">
> <property xml:space="preserve" propType="org.apache.jmeter.testelement.property.StringProperty" name="TestElement.gui_class">org.apache.jmeter.config.gui.ArgumentsPanel</property>
> <property xml:space="preserve" propType="org.apache.jmeter.testelement.property.StringProperty" name="TestElement.test_class">org.apache.jmeter.config.Arguments</property>
> <collection class="java.util.ArrayList" propType="org.apache.jmeter.testelement.property.CollectionProperty" name="Arguments.arguments"/>
> <property xml:space="preserve" propType="org.apache.jmeter.testelement.property.StringProperty" name="TestElement.name">Argument List</property>
> <property xml:space="preserve" propType="org.apache.jmeter.testelement.property.BooleanProperty" name="TestElement.enabled">true</property>
> </testelement>
> <property xml:space="preserve" propType="org.apache.jmeter.testelement.property.StringProperty" name="TestElement.gui_class">org.apache.jmeter.control.gui.TestPlanGui</property>
> <collection class="java.util.LinkedList" propType="org.apache.jmeter.testelement.property.CollectionProperty" name="TestPlan.thread_groups"/>
> <property xml:space="preserve" propType="org.apache.jmeter.testelement.property.StringProperty" name="TestElement.test_class">org.apache.jmeter.testelement.TestPlan</property>
> <property xml:space="preserve" propType="org.apache.jmeter.testelement.property.BooleanProperty" name="TestPlan.serialize_threadgroups">false</property>
> <property xml:space="preserve" propType="org.apache.jmeter.testelement.property.StringProperty" name="TestElement.name">Test Plan</property>
> <property xml:space="preserve" propType="org.apache.jmeter.testelement.property.BooleanProperty" name="TestElement.enabled">true</property>
> <property xml:space="preserve" propType="org.apache.jmeter.testelement.property.BooleanProperty" name="TestPlan.functional_mode">false</property>
> </testelement>
> <node>
> <testelement class="org.apache.jmeter.threads.ThreadGroup">
> <property xml:space="preserve" propType="org.apache.jmeter.testelement.property.StringProperty" name="TestElement.gui_class">org.apache.jmeter.threads.gui.ThreadGroupGui</property>
> <property xml:space="preserve" propType="org.apache.jmeter.testelement.property.LongProperty" name="ThreadGroup.start_time">0</property>
> <property xml:space="preserve" propType="org.apache.jmeter.testelement.property.StringProperty" name="TestElement.test_class">org.apache.jmeter.threads.ThreadGroup</property>
> <testelement class="org.apache.jmeter.control.LoopController" name="ThreadGroup.main_controller">
> <property xml:space="preserve" propType="org.apache.jmeter.testelement.property.StringProperty" name="TestElement.gui_class">org.apache.jmeter.control.gui.LoopControlPanel</property>
> <property xml:space="preserve" propType="org.apache.jmeter.testelement.property.IntegerProperty" name="LoopController.loops">-1</property>
> <property xml:space="preserve" propType="org.apache.jmeter.testelement.property.StringProperty" name="TestElement.test_class">org.apache.jmeter.control.LoopController</property>
> <property xml:space="preserve" propType="org.apache.jmeter.testelement.property.StringProperty" name="TestElement.name">Loop Controller</property>
> <property xml:space="preserve" propType="org.apache.jmeter.testelement.property.BooleanProperty" name="TestElement.enabled">true</property>
> <property xml:space="preserve" propType="org.apache.jmeter.testelement.property.BooleanProperty" name="LoopController.continue_forever">false</property>
> </testelement>
> <property xml:space="preserve" propType="org.apache.jmeter.testelement.property.StringProperty" name="TestElement.name">Thread Group</property>
> <property xml:space="preserve" propType="org.apache.jmeter.testelement.property.LongProperty" name="ThreadGroup.end_time">0</property>
> <property xml:space="preserve" propType="org.apache.jmeter.testelement.property.StringProperty" name="ThreadGroup.duration"/>
> <property xml:space="preserve" propType="org.apache.jmeter.testelement.property.StringProperty" name="ThreadGroup.on_sample_error">continue</property>
> <property xml:space="preserve" propType="org.apache.jmeter.testelement.property.BooleanProperty" name="TestElement.enabled">true</property>
> <property xml:space="preserve" propType="org.apache.jmeter.testelement.property.StringProperty" name="ThreadGroup.num_threads">10</property>
> <property xml:space="preserve" propType="org.apache.jmeter.testelement.property.BooleanProperty" name="ThreadGroup.scheduler">false</property>
> <property xml:space="preserve" propType="org.apache.jmeter.testelement.property.StringProperty" name="ThreadGroup.ramp_time">1</property>
> </testelement>
> <node>
> <testelement class="org.apache.jmeter.protocol.http.sampler.HTTPSampler">
> <property xml:space="preserve" propType="org.apache.jmeter.testelement.property.StringProperty" name="HTTPSampler.path">/test-httpsamplerfull/Yahoo!.htm</property>
> <property xml:space="preserve" propType="org.apache.jmeter.testelement.property.StringProperty" name="TestElement.test_class">org.apache.jmeter.protocol.http.sampler.HTTPSampler</property>
> <property xml:space="preserve" propType="org.apache.jmeter.testelement.property.StringProperty" name="HTTPSampler.encoded_path">/test-httpsamplerfull/Yahoo!.htm</property>
> <property xml:space="preserve" propType="org.apache.jmeter.testelement.property.StringProperty" name="HTTPSampler.method">GET</property>
> <property xml:space="preserve" propType="org.apache.jmeter.testelement.property.BooleanProperty" name="HTTPSampler.use_keepalive">true</property>
> <property xml:space="preserve" propType="org.apache.jmeter.testelement.property.StringProperty" name="HTTPSampler.protocol">http</property>
> <property xml:space="preserve" propType="org.apache.jmeter.testelement.property.BooleanProperty" name="TestElement.enabled">true</property>
> <property xml:space="preserve" propType="org.apache.jmeter.testelement.property.BooleanProperty" name="HTTPSampler.image_parser">true</property>
> <property xml:space="preserve" propType="org.apache.jmeter.testelement.property.BooleanProperty" name="HTTPSampler.follow_redirects">true</property>
> <property xml:space="preserve" propType="org.apache.jmeter.testelement.property.StringProperty" name="HTTPSampler.port"/>
> <testelement class="org.apache.jmeter.config.Arguments" name="HTTPsampler.Arguments">
> <property xml:space="preserve" propType="org.apache.jmeter.testelement.property.StringProperty" name="TestElement.gui_class">org.apache.jmeter.protocol.http.gui.HTTPArgumentsPanel</property>
> <property xml:space="preserve" propType="org.apache.jmeter.testelement.property.StringProperty" name="TestElement.test_class">org.apache.jmeter.config.Arguments</property>
> <collection class="java.util.LinkedList" propType="org.apache.jmeter.testelement.property.CollectionProperty" name="Arguments.arguments"/>
> <property xml:space="preserve" propType="org.apache.jmeter.testelement.property.StringProperty" name="TestElement.name">Argument List</property>
> <property xml:space="preserve" propType="org.apache.jmeter.testelement.property.BooleanProperty" name="TestElement.enabled">true</property>
> </testelement>
> <property xml:space="preserve" propType="org.apache.jmeter.testelement.property.StringProperty" name="HTTPSampler.mimetype"/>
> <property xml:space="preserve" propType="org.apache.jmeter.testelement.property.StringProperty" name="TestElement.gui_class">org.apache.jmeter.protocol.http.control.gui.HttpTestSampleGui</property>
> <property xml:space="preserve" propType="org.apache.jmeter.testelement.property.StringProperty" name="HTTPSampler.FILE_FIELD"/>
> <property xml:space="preserve" propType="org.apache.jmeter.testelement.property.StringProperty" name="TestElement.name">HTTP Request</property>
> <property xml:space="preserve" propType="org.apache.jmeter.testelement.property.StringProperty" name="HTTPSampler.domain">localhost</property>
> <property xml:space="preserve" propType="org.apache.jmeter.testelement.property.StringProperty" name="HTTPSampler.FILE_NAME"/>
> </testelement>
> </node>
> <node>
> <testelement class="org.apache.jmeter.reporters.ResultCollector">
> <property xml:space="preserve" propType="org.apache.jmeter.testelement.property.StringProperty" name="TestElement.gui_class">org.apache.jmeter.visualizers.StatVisualizer</property>
> <property xml:space="preserve" propType="org.apache.jmeter.testelement.property.StringProperty" name="TestElement.test_class">org.apache.jmeter.reporters.ResultCollector</property>
> <property xml:space="preserve" propType="org.apache.jmeter.testelement.property.StringProperty" name="TestElement.name">Aggregate Report</property>
> <property xml:space="preserve" propType="org.apache.jmeter.testelement.property.BooleanProperty" name="TestElement.enabled">true</property>
> <property xml:space="preserve" propType="org.apache.jmeter.testelement.property.StringProperty" name="filename"/>
> <property xml:space="preserve" propType="org.apache.jmeter.testelement.property.BooleanProperty" name="ResultCollector.error_logging">false</property>
> </testelement>
> </node>
> <node>
> <testelement class="org.apache.jmeter.reporters.ResultCollector">
> <property xml:space="preserve" propType="org.apache.jmeter.testelement.property.StringProperty" name="TestElement.gui_class">org.apache.jmeter.visualizers.ViewResultsFullVisualizer</property>
> <property xml:space="preserve" propType="org.apache.jmeter.testelement.property.StringProperty" name="TestElement.test_class">org.apache.jmeter.reporters.ResultCollector</property>
> <property xml:space="preserve" propType="org.apache.jmeter.testelement.property.StringProperty" name="TestElement.name">View Results Tree</property>
> <property xml:space="preserve" propType="org.apache.jmeter.testelement.property.BooleanProperty" name="TestElement.enabled">false</property>
> <property xml:space="preserve" propType="org.apache.jmeter.testelement.property.StringProperty" name="filename"/>
> <property xml:space="preserve" propType="org.apache.jmeter.testelement.property.BooleanProperty" name="ResultCollector.error_logging">false</property>
> </testelement>
> </node>
> </node>
> </node>
> 
> 
> ------------------------------------------------------------------------
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: jmeter-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: jmeter-dev-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: jmeter-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: jmeter-dev-help@jakarta.apache.org

Re: HTTPSamplerFull performance: HTMLParser vs. Regexp-based

Posted by Jordi Salvat i Alabart <js...@atg.com>.


En/na peter lin ha escrit:
> [...]   My
> guess is regexp will be faster than either JTidy or
> HTMLParser, but the hardpart is extensibility.

I too was guessing regexp would be faster. But what I'm seeing is that 
it's only marginally faster, so it's not worth the added difficulty -- I 
absolutely agree it is more difficult. I was thinking that the improved 
performance could compensate for the difficulty, but doesn't seem to be 
the case -- a 20% improvement doesn't justify anything.

-- 
Salut,

Jordi.


---------------------------------------------------------------------
To unsubscribe, e-mail: jmeter-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: jmeter-dev-help@jakarta.apache.org

Re: HTTPSamplerFull performance: HTMLParser vs. Regexp-based

Posted by peter lin <jm...@yahoo.com>.

Those are interesting results.  My main reason for not
using regex, is I am not an expert at regexp.  My
guess is regexp will be faster than either JTidy or
HTMLParser, but the hardpart is extensibility.
Extending HTMLParser is considerably easier than
master regexp for me.

--- Jordi Salvat i Alabart <js...@atg.com> wrote:
> Hi.
> 
> I've finally found some time to test the performance
> of the 
> HTTPSamplerFull implementation currently in CVS
> (developped by Peter Lin 
> using HTMLParser) against the implementation I sent
> a while ago to the 
> list (developped by me using Regexps). [Remember:
> the objective is not 
> to decide which is best, but whether it's worth
> having both available to 
> script developers].
> 
> The results are not conclusive, but they prove that
> the issue deserves 
> further analysis:
> 
> 1/ On the example I've been using, the Regexp-based
> implementation was 
> more accurate than the HTMLParser-based one. This is
> very surprising to 
> me, since I expected the Regexp-based implementation
> to be generally 
> less accurate. I'll need some help on this one. More
> details later.

I can report this as a bug to the HTMLParser
developers and file a bug report. It worked for the
tests I ran, which was basically the benchmark classes
in the test directory. It could be a bug in how I
implemented it originally in NewHTTPSampler, which
sebastian has refactored last week.  The way
htmlparser work is registering listeners, so it does
it in one pass. I believe the cost is in building a
structured object.

protected static void addTagListeners(Parser parser) 
{
log.debug("Start : addTagListeners");
// add body tag scanner
parser.addScanner(new BodyScanner());
// add ImageTag scanner
LinkScanner linkScanner = new
LinkScanner(LinkTag.LINK_TAG_FILTER);
// parser.addScanner(linkScanner);
parser.addScanner(linkScanner.createImageScanner(ImageTag.IMAGE_TAG_FILTER));
// add input tag scanner
parser.addScanner(new InputTagScanner());
// add applet tag scanner
parser.addScanner(new AppletScanner());	
}

You'll see that parse is only called once.

try {
// we start to iterate through the elements
for(NodeIterator e = parser.elements();
e.hasMoreNodes();)

> 
> 2/ On the example I've been using, the Regexp-based
> implementation was 
> at least 7 times faster than the HTTPParser-based
> one. A quick look at 
> the code suggests that the HTML Parser is being
> called 5 times (one for 
> each tag of interest: img, applet, input, body,
> table). Am I correct? 
> The regexp-based implementation only scans through
> the HTML once. This 
> could well explain most of the performance
> difference. Is there any way 
> to recode the HTMLParser-based implementation to do
> the job in a single 
> scan?

by design, regexp is a type 1 finite state machine and
therefore should be faster. But the challenge here is
this. If you need to parse an element that has
subnodes, doing it in regexp is harder. For example,
say users want the ability to parse a specific table.
I believe doing it in regexp would require multiple
steps. It may still be faster, but it would probably
take much more work to do the same task. I could be
wrong.

> In addition, the HTMLParser-based implementation is
> failing to download 
> two images: powrdbyhp_blu_84x28_yahoo.gif (it is
> downloading the HTML 
> page again instead) and 031121_l300.gif (it
> downloads nothing). I've 
> used Mozilla's "Live HTTP Headers" to see what
> Mozilla does and it 
> matches what the Regexp-based implementation is
> doing. I'd say there's a 
> bug in the HTMLParser. Can someone familiar with it
> have a look? (Hi 
> Peter!).

you can look at the benchmark classes I wrote to test
the performance against JTidy before I implemented the
sampler.

http://cvs.apache.org/viewcvs/jakarta-jmeter/src/htmlparser/org/htmlparser/tests/BenchmarkP.java
http://cvs.apache.org/viewcvs/jakarta-jmeter/src/htmlparser/org/htmlparser/tests/BenchmarkTidy.java

when I did a test using CNet and Yahoo homepage, it
did correctly get all the image tags. Is the image a
banner? I did notice banners weren't loaded in my
test, but it was because the link pointed to another
server. I believe this may be the result of how I
implemented support. The current implementation gets
the image, and input tags. That was how JTidy was
implemented, so I unknowingly ported the bad
implementation. Or am I missing something?

I'm pretty busy these days, so I may not have time to
fix it for a week or two. I think doing what works or
what users expect is the right decision. If htmlparser
doesn't meet the performance requirements, then I see
no reason lock JMeter to Htmlparser. In defense of
HTMLParser, it is a solid library and does improve the
throughput of JMeter. The extensibility of the design
to me is sound and very extensible. Everyone has
different preferences, so maybe support both ways of
parsing HTML? For myself, I don't really have time to
become a regexp guru and one of the approaches i
considered was to write a HTML parser using a
stack-based parser from scratch. Ultimately I decided
HTMLParser provided the features and extensibility.
Plus i wasn't confident for complex use cases, a
stack-based parser would be easy for others to extend
and use.

on a benchmark note, I ran stand alone benchmarks with
just JTidy and HTMLparser and with NewHTTPSamplerFull.
I also profiled the sampler using OptimizeIt. All of
the data I generated showed HTMLParser provide real
benefit.

peter

__________________________________
Do you Yahoo!?
Free Pop-Up Blocker - Get it now
http://companion.yahoo.com/

---------------------------------------------------------------------
To unsubscribe, e-mail: jmeter-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: jmeter-dev-help@jakarta.apache.org