You are viewing a plain text version of this content. The canonical link for it is here.

Posted to jmeter-dev@jakarta.apache.org by Hideaki Kimura <hk...@be.to> on 2006/03/23 13:31:38 UTC

In order to update htmlparser and isolate it

Hi everyone,

I wrote some codes for JMeter.
Before committing it to SVN, I'd like to consult developers
who have a lot of experience in JMeter.

First of all, let me tell why I wrote the codes.
When I tried to parse HTMLs in a BeanShell Sampler to correlate,
I had a little trouble in using older htmlparser.
JMeter uses 1.3 of htmlparser, not the latest version 1.6
,which has modified many bugs and has strong NodeFilters.

However, as htmlparser is under LGPL while JMeter is under
Apache License, I firstly have to make JMeter working well without
htmlpaser for updating htmlparser to 1.6.
For that, I added this function to HTMLParser.java in
"src/protocol/http/org/apache/jmeter/protocol/http/parser"

/**
 * Parsers should over-ride this method if the parser might be
 * "not ready" to use in some situation.
 * @return true if the HTMLParser is ready to use.
 */
protected boolean isValid() {
	return true;
}

and following lines after the lines where HTMLParser is instanciated.

if (!pars.isValid()) {
	log.warn(htmlParserClassName + " can't be used. Instead, RegexpHTMLParser is used.");
	pars = new RegexpHTMLParser(); // RegexpHTMLParser is always ready to use.
} else {
	log.info("Created " + htmlParserClassName);
}

I also added following lines to HtmlParserHTMLParser.java
/** {@inheritDoc}. **/
protected boolean isValid() {
	// check whether htmlparser exists.
	try {
		new Parser();
	} catch (NoClassDefFoundError e) {
		return false;
	}
	return true;
}

These codes enable JMeter to extract links from downloaded HTML
even when htmlparser.jar doesn't exist.


After doing them, I deleted src/htmlparser and added
  filterbuilder.jar
  htmllexer.jar
  htmlparser.jar
  sax2.jar
  thumbelina.jar
to classpath instead.
As HtmlParserHTMLParser uses many functions and classes 
of htmlparser which are not in the latest version,
I also changed HtmlParserHTMLParser to a large degree.


Finally, I got JMeter which, in the default setting where
HtmlParserHTMLParser is used, can extract links from HTML
even without htmlparser and can work with htmlparser 1.6.
It's working well in my environment.

But, I'm not sure I may commit it.
While these changes enable to use htmlparser 1.6 in a BeanShell
script, they also disable to use htmlparser 1.3.
I tried to write a HtmlParserHTMLParser which works with 1.3 
as well as 1.6, but it's impossible. Too much was changed in
htmlparser, such as "Tag"'s package.

And also, the process of building and releasing has to be
changed to some degree, as htmlparser has to be separated
from JMeter.

Moreover, I'm not very sure whether these codes don't conflict
with the LGPL of htmlparser.

So.....
I need your help. Do you think I may commit them?
And, what should I change in build.xml due to them?

I'll appreciate any advice.
Regards,
Hideaki Kimura


---------------------------------------------------------------------
To unsubscribe, e-mail: jmeter-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: jmeter-dev-help@jakarta.apache.org

Re: In order to update htmlparser and isolate it

Posted by Hideaki Kimura <hk...@be.to>.

Hi,

> > So.....
> > I need your help. Do you think I may commit them?
> 
> The best way to provide patches etc to update JMeter is to create a
> Bugzilla issue, and then you can attach files to it.
> Please use unified diff format for patches - new files can be uploaded as is.
I have issued a bug in Bugzilla and attached my patches.
Could you look at it?

> > And, what should I change in build.xml due to them?
> 
> We'll do this if required.
It's so helpful if you do it.
I haven't completely understood JMeter's build process yet.


Regards,
Hideaki


---------------------------------------------------------------------
To unsubscribe, e-mail: jmeter-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: jmeter-dev-help@jakarta.apache.org

Re: In order to update htmlparser and isolate it

Posted by sebb <se...@gmail.com>.

On 23/03/06, Hideaki Kimura <hk...@be.to> wrote:
> Hi everyone,
>
> I wrote some codes for JMeter.
> Before committing it to SVN, I'd like to consult developers
> who have a lot of experience in JMeter.
>
> First of all, let me tell why I wrote the codes.
> When I tried to parse HTMLs in a BeanShell Sampler to correlate,
> I had a little trouble in using older htmlparser.
> JMeter uses 1.3 of htmlparser, not the latest version 1.6
> ,which has modified many bugs and has strong NodeFilters.
>
> However, as htmlparser is under LGPL while JMeter is under
> Apache License, I firstly have to make JMeter working well without
> htmlpaser for updating htmlparser to 1.6.
> For that, I added this function to HTMLParser.java in
> "src/protocol/http/org/apache/jmeter/protocol/http/parser"
>
> /**
>  * Parsers should over-ride this method if the parser might be
>  * "not ready" to use in some situation.
>  * @return true if the HTMLParser is ready to use.
>  */
> protected boolean isValid() {
>        return true;
> }
>
> and following lines after the lines where HTMLParser is instanciated.
>
> if (!pars.isValid()) {
>        log.warn(htmlParserClassName + " can't be used. Instead, RegexpHTMLParser is used.");
>        pars = new RegexpHTMLParser(); // RegexpHTMLParser is always ready to use.
> } else {
>        log.info("Created " + htmlParserClassName);
> }
>
> I also added following lines to HtmlParserHTMLParser.java
> /** {@inheritDoc}. **/
> protected boolean isValid() {
>        // check whether htmlparser exists.
>        try {
>                new Parser();
>        } catch (NoClassDefFoundError e) {
>                return false;
>        }
>        return true;
> }
>
> These codes enable JMeter to extract links from downloaded HTML
> even when htmlparser.jar doesn't exist.
>
>
> After doing them, I deleted src/htmlparser and added
>  filterbuilder.jar
>  htmllexer.jar
>  htmlparser.jar
>  sax2.jar
>  thumbelina.jar
> to classpath instead.
> As HtmlParserHTMLParser uses many functions and classes
> of htmlparser which are not in the latest version,
> I also changed HtmlParserHTMLParser to a large degree.
>
>
> Finally, I got JMeter which, in the default setting where
> HtmlParserHTMLParser is used, can extract links from HTML
> even without htmlparser and can work with htmlparser 1.6.
> It's working well in my environment.
>
> But, I'm not sure I may commit it.
> While these changes enable to use htmlparser 1.6 in a BeanShell
> script, they also disable to use htmlparser 1.3.
> I tried to write a HtmlParserHTMLParser which works with 1.3
> as well as 1.6, but it's impossible. Too much was changed in
> htmlparser, such as "Tag"'s package.

No point, anyway.

> And also, the process of building and releasing has to be
> changed to some degree, as htmlparser has to be separated
> from JMeter.

Agreed.

> Moreover, I'm not very sure whether these codes don't conflict
> with the LGPL of htmlparser.

So long as we don't include any of the htmlparser jars in the
distribution, it should be OK to add code to call htmlparser.

> So.....
> I need your help. Do you think I may commit them?

The best way to provide patches etc to update JMeter is to create a
Bugzilla issue, and then you can attach files to it.

Please use unified diff format for patches - new files can be uploaded as is.

> And, what should I change in build.xml due to them?

We'll do this if required.

> I'll appreciate any advice.
> Regards,
> Hideaki Kimura
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: jmeter-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: jmeter-dev-help@jakarta.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: jmeter-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: jmeter-dev-help@jakarta.apache.org