You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jmeter.apache.org by Philippe Mouawad <ph...@gmail.com> on 2013/09/26 22:48:51 UTC

Re: htmlParser.className default value

Hello,
I really think this setting should be changed as
HtmlParserHTMLParser is really catastrophic in terms of performance and
memory use.

Or at least a note should be added, but my preference goes to switching to
REGEXP which seems to be doing the job.

Regards
Philippe


On Sun, Mar 3, 2013 at 9:06 PM, Philippe Mouawad <philippe.mouawad@gmail.com
> wrote:

> Hello,
>
> I made recently a Real world test which downloaded resources.
> As the site started to slow down, I ended up having an OOM.
>
> Analyzing Heap Dump, I noticed one JMeterThread held around 3 mo which
> majority was taken by DOM build by htmlparser.
>
> So I think Regexp is far more efficient on memory usage. But if you say it
> is a quick and dirty alternative then it's another point.
>
> I wonder if it would not be interesting to explore using JSOUP in a new
> implementation.
>
> Regards
> Philippe
>
>
> On Sun, Mar 3, 2013 at 3:42 PM, sebb <se...@gmail.com> wrote:
>
>> On 2 March 2013 19:42, Philippe Mouawad <ph...@gmail.com>
>> wrote:
>> > Hello,
>> > I was wondering if there is any reason for htmlParser.className default
>> > value being org.apache.jmeter.protocol.http.parser.HtmlParserHTMLParser
>> and
>> > not org.apache.jmeter.protocol.http.parser.RegexpHTMLParser
>> >
>> > It seems to me the latter is much more efficient than the current
>> default
>> > value.
>>
>> I think one would need to benchmark that to see how much faster it is.
>>
>> > Any objection on changing to
>> > org.apache.jmeter.protocol.http.parser.RegexpHTMLParser
>>
>> The Regex version does not take account of context, so will find
>> references in comment sections.
>>
>> It was intended as a quick and dirty alternative.
>>
>> > --
>> > Regards.
>> > Philippe
>>
>
>
>
> --
> Cordialement.
> Philippe Mouawad.
>
>
>


-- 
Cordialement.
Philippe Mouawad.

Re: htmlParser.className default value

Posted by Philippe Mouawad <ph...@gmail.com>.
Created :
https://issues.apache.org/bugzilla/show_bug.cgi?id=55632

On Thu, Sep 26, 2013 at 11:05 PM, Philippe Mouawad <
philippe.mouawad@gmail.com> wrote:

>
>
>
> On Thu, Sep 26, 2013 at 10:58 PM, sebb <se...@gmail.com> wrote:
>
>> On 26 September 2013 21:48, Philippe Mouawad <ph...@gmail.com>
>> wrote:
>> > Hello,
>> > I really think this setting should be changed as
>> > HtmlParserHTMLParser is really catastrophic in terms of performance and
>> > memory use.
>> >
>> > Or at least a note should be added, but my preference goes to switching
>> to
>> > REGEXP which seems to be doing the job.
>>
>> I don't think we should change the default; it may well break test
>> plans as commenting out sections is a common practise.
>>
>
> Why not change the default and document that users can set the old parser
> to what it was ?
> Take a new comer, he won't read all documentation once, in my opinion,
> defaults should be the best options for performances.
>
> If users have issues with Regexp, we will have bugzillas and will fix
> them, they can provide the page for which parsing failed , as we already
> had a report on this, it's easy.
>
> While if we keep it like this, you will have users face OOM on high load
> tests because of this, and I am not sure they will report or if they do it
> could be much harder to find out it was due to this.
> And we will always have this "urban legend" about JMeter having OOM, which
> frankly is starting to upset me :-)
>
>
>> However by all means add a note to jmeter.properties and
>> component_reference
>>
>> > Regards
>> > Philippe
>> >
>> >
>> > On Sun, Mar 3, 2013 at 9:06 PM, Philippe Mouawad <
>> philippe.mouawad@gmail.com
>> >> wrote:
>> >
>> >> Hello,
>> >>
>> >> I made recently a Real world test which downloaded resources.
>> >> As the site started to slow down, I ended up having an OOM.
>> >>
>> >> Analyzing Heap Dump, I noticed one JMeterThread held around 3 mo which
>> >> majority was taken by DOM build by htmlparser.
>> >>
>> >> So I think Regexp is far more efficient on memory usage. But if you
>> say it
>> >> is a quick and dirty alternative then it's another point.
>> >>
>> >> I wonder if it would not be interesting to explore using JSOUP in a new
>> >> implementation.
>> >>
>> >> Regards
>> >> Philippe
>> >>
>> >>
>> >> On Sun, Mar 3, 2013 at 3:42 PM, sebb <se...@gmail.com> wrote:
>> >>
>> >>> On 2 March 2013 19:42, Philippe Mouawad <ph...@gmail.com>
>> >>> wrote:
>> >>> > Hello,
>> >>> > I was wondering if there is any reason for htmlParser.className
>> default
>> >>> > value being
>> org.apache.jmeter.protocol.http.parser.HtmlParserHTMLParser
>> >>> and
>> >>> > not org.apache.jmeter.protocol.http.parser.RegexpHTMLParser
>> >>> >
>> >>> > It seems to me the latter is much more efficient than the current
>> >>> default
>> >>> > value.
>> >>>
>> >>> I think one would need to benchmark that to see how much faster it is.
>> >>>
>> >>> > Any objection on changing to
>> >>> > org.apache.jmeter.protocol.http.parser.RegexpHTMLParser
>> >>>
>> >>> The Regex version does not take account of context, so will find
>> >>> references in comment sections.
>> >>>
>> >>> It was intended as a quick and dirty alternative.
>> >>>
>> >>> > --
>> >>> > Regards.
>> >>> > Philippe
>> >>>
>> >>
>> >>
>> >>
>> >> --
>> >> Cordialement.
>> >> Philippe Mouawad.
>> >>
>> >>
>> >>
>> >
>> >
>> > --
>> > Cordialement.
>> > Philippe Mouawad.
>>
>
>
>
> --
> Cordialement.
> Philippe Mouawad.
>
>
>


-- 
Cordialement.
Philippe Mouawad.

Re: htmlParser.className default value

Posted by Philippe Mouawad <ph...@gmail.com>.
On Thu, Sep 26, 2013 at 10:58 PM, sebb <se...@gmail.com> wrote:

> On 26 September 2013 21:48, Philippe Mouawad <ph...@gmail.com>
> wrote:
> > Hello,
> > I really think this setting should be changed as
> > HtmlParserHTMLParser is really catastrophic in terms of performance and
> > memory use.
> >
> > Or at least a note should be added, but my preference goes to switching
> to
> > REGEXP which seems to be doing the job.
>
> I don't think we should change the default; it may well break test
> plans as commenting out sections is a common practise.
>

Why not change the default and document that users can set the old parser
to what it was ?
Take a new comer, he won't read all documentation once, in my opinion,
defaults should be the best options for performances.

If users have issues with Regexp, we will have bugzillas and will fix them,
they can provide the page for which parsing failed , as we already had a
report on this, it's easy.

While if we keep it like this, you will have users face OOM on high load
tests because of this, and I am not sure they will report or if they do it
could be much harder to find out it was due to this.
And we will always have this "urban legend" about JMeter having OOM, which
frankly is starting to upset me :-)


> However by all means add a note to jmeter.properties and
> component_reference
>
> > Regards
> > Philippe
> >
> >
> > On Sun, Mar 3, 2013 at 9:06 PM, Philippe Mouawad <
> philippe.mouawad@gmail.com
> >> wrote:
> >
> >> Hello,
> >>
> >> I made recently a Real world test which downloaded resources.
> >> As the site started to slow down, I ended up having an OOM.
> >>
> >> Analyzing Heap Dump, I noticed one JMeterThread held around 3 mo which
> >> majority was taken by DOM build by htmlparser.
> >>
> >> So I think Regexp is far more efficient on memory usage. But if you say
> it
> >> is a quick and dirty alternative then it's another point.
> >>
> >> I wonder if it would not be interesting to explore using JSOUP in a new
> >> implementation.
> >>
> >> Regards
> >> Philippe
> >>
> >>
> >> On Sun, Mar 3, 2013 at 3:42 PM, sebb <se...@gmail.com> wrote:
> >>
> >>> On 2 March 2013 19:42, Philippe Mouawad <ph...@gmail.com>
> >>> wrote:
> >>> > Hello,
> >>> > I was wondering if there is any reason for htmlParser.className
> default
> >>> > value being
> org.apache.jmeter.protocol.http.parser.HtmlParserHTMLParser
> >>> and
> >>> > not org.apache.jmeter.protocol.http.parser.RegexpHTMLParser
> >>> >
> >>> > It seems to me the latter is much more efficient than the current
> >>> default
> >>> > value.
> >>>
> >>> I think one would need to benchmark that to see how much faster it is.
> >>>
> >>> > Any objection on changing to
> >>> > org.apache.jmeter.protocol.http.parser.RegexpHTMLParser
> >>>
> >>> The Regex version does not take account of context, so will find
> >>> references in comment sections.
> >>>
> >>> It was intended as a quick and dirty alternative.
> >>>
> >>> > --
> >>> > Regards.
> >>> > Philippe
> >>>
> >>
> >>
> >>
> >> --
> >> Cordialement.
> >> Philippe Mouawad.
> >>
> >>
> >>
> >
> >
> > --
> > Cordialement.
> > Philippe Mouawad.
>



-- 
Cordialement.
Philippe Mouawad.

Re: htmlParser.className default value

Posted by sebb <se...@gmail.com>.
On 26 September 2013 21:48, Philippe Mouawad <ph...@gmail.com> wrote:
> Hello,
> I really think this setting should be changed as
> HtmlParserHTMLParser is really catastrophic in terms of performance and
> memory use.
>
> Or at least a note should be added, but my preference goes to switching to
> REGEXP which seems to be doing the job.

I don't think we should change the default; it may well break test
plans as commenting out sections is a common practise.

However by all means add a note to jmeter.properties and component_reference

> Regards
> Philippe
>
>
> On Sun, Mar 3, 2013 at 9:06 PM, Philippe Mouawad <philippe.mouawad@gmail.com
>> wrote:
>
>> Hello,
>>
>> I made recently a Real world test which downloaded resources.
>> As the site started to slow down, I ended up having an OOM.
>>
>> Analyzing Heap Dump, I noticed one JMeterThread held around 3 mo which
>> majority was taken by DOM build by htmlparser.
>>
>> So I think Regexp is far more efficient on memory usage. But if you say it
>> is a quick and dirty alternative then it's another point.
>>
>> I wonder if it would not be interesting to explore using JSOUP in a new
>> implementation.
>>
>> Regards
>> Philippe
>>
>>
>> On Sun, Mar 3, 2013 at 3:42 PM, sebb <se...@gmail.com> wrote:
>>
>>> On 2 March 2013 19:42, Philippe Mouawad <ph...@gmail.com>
>>> wrote:
>>> > Hello,
>>> > I was wondering if there is any reason for htmlParser.className default
>>> > value being org.apache.jmeter.protocol.http.parser.HtmlParserHTMLParser
>>> and
>>> > not org.apache.jmeter.protocol.http.parser.RegexpHTMLParser
>>> >
>>> > It seems to me the latter is much more efficient than the current
>>> default
>>> > value.
>>>
>>> I think one would need to benchmark that to see how much faster it is.
>>>
>>> > Any objection on changing to
>>> > org.apache.jmeter.protocol.http.parser.RegexpHTMLParser
>>>
>>> The Regex version does not take account of context, so will find
>>> references in comment sections.
>>>
>>> It was intended as a quick and dirty alternative.
>>>
>>> > --
>>> > Regards.
>>> > Philippe
>>>
>>
>>
>>
>> --
>> Cordialement.
>> Philippe Mouawad.
>>
>>
>>
>
>
> --
> Cordialement.
> Philippe Mouawad.