You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jmeter-dev@jakarta.apache.org by bu...@apache.org on 2006/03/24 02:59:17 UTC

DO NOT REPLY [Bug 39092] New: - htmlparser should be updated and isolated

DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG�
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=39092>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND�
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=39092

           Summary: htmlparser should be updated and isolated
           Product: JMeter
           Version: 2.1.1
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: HTTP
        AssignedTo: jmeter-dev@jakarta.apache.org
        ReportedBy: hkimura@be.to


JMeter uses 1.3 of htmlparser, not the latest version 1.6
,which has modified many bugs and has strong NodeFilters.
And, just replacing htmlparser.jar in distributed JMeter with latest htmlparser
doesn't work because of the use of incompatible API s in
HtmlParserHTMLParser.java.
This makes the use of htmlparser in BeanShell Samper a little difficult.
This is why JMeter should UPDATE the htmlparser.

However, as htmlparser is under LGPL while JMeter is under
Apache License, we have to make JMeter working well without
htmlpaser for updating the donated htmlparser codes
to the latest, 1.6 .
This is why JMeter should ISOLATE the htmlparser.

-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

---------------------------------------------------------------------
To unsubscribe, e-mail: jmeter-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: jmeter-dev-help@jakarta.apache.org


DO NOT REPLY [Bug 39092] - htmlparser should be updated and isolated

Posted by bu...@apache.org.
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG�
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=39092>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND�
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=39092





------- Additional Comments From hkimura@be.to  2006-03-24 02:00 -------
Created an attachment (id=17962)
 --> (http://issues.apache.org/bugzilla/attachment.cgi?id=17962&action=view)
a patch for HtmlParserHTMLParser to UPDATE the htmlparser


-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

---------------------------------------------------------------------
To unsubscribe, e-mail: jmeter-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: jmeter-dev-help@jakarta.apache.org


DO NOT REPLY [Bug 39092] - htmlparser should be updated and isolated

Posted by bu...@apache.org.
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG�
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=39092>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND�
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=39092


sebb@apache.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEEDINFO                    |ASSIGNED




------- Additional Comments From sebb@apache.org  2006-03-25 00:34 -------
That works better, though I had to change:

baseUrl.url = new URL(baseUrl.url, baseHref.getBaseUrl() + "/");

to

baseUrl.url = new URL(baseUrl.url, baseHref.getBaseUrl());

to avoid getting  // in URLs.

==

By the way, I found I only needed htmlparser.jar for compiling and running.

-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

---------------------------------------------------------------------
To unsubscribe, e-mail: jmeter-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: jmeter-dev-help@jakarta.apache.org


DO NOT REPLY [Bug 39092] - htmlparser should be updated and isolated

Posted by bu...@apache.org.
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG�
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=39092>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND�
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=39092





------- Additional Comments From hkimura@be.to  2006-03-24 22:00 -------
!!!
So so sorry, silly mistake.
It just overwrites the pointer, not pointee.
I'll modify it right now.

(In reply to comment #14)
> There's a problem with the parsing code - it does not seem to handle BASEREF
> tags properly. They are detected, but the new base is not saved for 
subsequent tags.



-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

---------------------------------------------------------------------
To unsubscribe, e-mail: jmeter-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: jmeter-dev-help@jakarta.apache.org


DO NOT REPLY [Bug 39092] - htmlparser should be updated and isolated

Posted by bu...@apache.org.
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG�
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=39092>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND�
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=39092





------- Additional Comments From hkimura@be.to  2006-03-24 03:23 -------
Was it in incorrect format? If so, sorry.
I used WinMerge to make the patches.
Anyaway, I uploaded the whole file.

(In reply to comment #10)
> I think I can get round the compilation problem.
> 
> However, the problem I have at the moment is that the patch does not work for me.
> 
> Please can you attach the full new parser file?



-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

---------------------------------------------------------------------
To unsubscribe, e-mail: jmeter-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: jmeter-dev-help@jakarta.apache.org


DO NOT REPLY [Bug 39092] - htmlparser should be updated and isolated

Posted by bu...@apache.org.
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG�
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=39092>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND�
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=39092





------- Additional Comments From hkimura@be.to  2006-03-24 02:08 -------
These patches enable JMeter to work with htmlparser 1.6 and work well
even if they can't detect htmlparser.jar.

As for development environment,
src/htmlparser should be deleted and related entries in build.xml or
eclipse.classpath should be removed.
Instead,
  filterbuilder.jar
  htmllexer.jar
  htmlparser.jar
  sax2.jar
  thumbelina.jar
included in the latest htmlparser
  http://sourceforge.net/project/showfiles.php?group_id=24399&package_id=47712
should be included in classpath for compiling.

As for binary build, htmlparser.jar should be no longer included so that users
can install htmlparser as an option.

-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

---------------------------------------------------------------------
To unsubscribe, e-mail: jmeter-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: jmeter-dev-help@jakarta.apache.org


DO NOT REPLY [Bug 39092] - htmlparser should be updated and isolated

Posted by bu...@apache.org.
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG�
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=39092>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND�
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=39092





------- Additional Comments From sebb@apache.org  2006-03-25 14:41 -------
(In reply to comment #20)
> And, I found one more bug of HtmlParserHTMLParser, in Line 163
>    if (tag.getAttribute("rel").equalsIgnoreCase("stylesheet")) {
> should be
>    if (tag.getAttribute("rel") != null &&
> tag.getAttribute("rel").equalsIgnoreCase("stylesheet")) {
> NPE happens during retrieving http://db.apache.org/ or other sites which have
> "</link>" (xhtml).

OK.

We're not interested in end tags - perhaps they should be filtered out.

BTW, why use (tagname.equalsIgnoreCase("LINK")) rather than (tag instanceof
LinkTag?)
 
> > I've not decided how best to build/test the new class automatically yet - at
> > present I'm using a separate Eclipse project, which is not ideal.
> > But in the meantime, please try it, and try to break it...
> I agree with you. It's better to think about how to provide this
> , though I feel like asking the htmlparser team again at some day.

If they are willing to additionally licence the binary under an ASF-compatible
license, then it would probably solve all the problems...

-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

---------------------------------------------------------------------
To unsubscribe, e-mail: jmeter-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: jmeter-dev-help@jakarta.apache.org


DO NOT REPLY [Bug 39092] - htmlparser should be updated and isolated

Posted by bu...@apache.org.
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG�
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=39092>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND�
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=39092


sebb@apache.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED




------- Additional Comments From sebb@apache.org  2006-03-24 02:11 -------
Thanks, I'll take a look at this shortly

-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

---------------------------------------------------------------------
To unsubscribe, e-mail: jmeter-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: jmeter-dev-help@jakarta.apache.org


DO NOT REPLY [Bug 39092] - htmlparser should be updated and isolated

Posted by bu...@apache.org.
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG�
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=39092>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND�
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=39092





------- Additional Comments From hkimura@be.to  2006-03-24 03:19 -------
Created an attachment (id=17964)
 --> (http://issues.apache.org/bugzilla/attachment.cgi?id=17964&action=view)
a whole source code of new HTMLParserHTMLParser


-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

---------------------------------------------------------------------
To unsubscribe, e-mail: jmeter-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: jmeter-dev-help@jakarta.apache.org


DO NOT REPLY [Bug 39092] - htmlparser should be updated and isolated

Posted by bu...@apache.org.
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG�
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=39092>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND�
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=39092





------- Additional Comments From hkimura@be.to  2006-03-25 14:59 -------
> BTW, why use (tagname.equalsIgnoreCase("LINK")) rather than (tag instanceof
> LinkTag?)
That is because LinkTag represents "A" tag, not "LINK" tag.
Instead, htmlparser 1.3 has "LinkTagTag", but 1.6 doesn't.

> If they are willing to additionally licence the binary under an ASF-compatible
> license, then it would probably solve all the problems...
Yeah, hoping so.
As peter said, I think they are so kind guys.

-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

---------------------------------------------------------------------
To unsubscribe, e-mail: jmeter-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: jmeter-dev-help@jakarta.apache.org


DO NOT REPLY [Bug 39092] - htmlparser should be updated and isolated

Posted by bu...@apache.org.
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG�
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=39092>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND�
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=39092





------- Additional Comments From hkimura@be.to  2006-03-24 02:00 -------
Created an attachment (id=17963)
 --> (http://issues.apache.org/bugzilla/attachment.cgi?id=17963&action=view)
a patch for HTMLParser to ISOLATE the htmlparser


-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

---------------------------------------------------------------------
To unsubscribe, e-mail: jmeter-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: jmeter-dev-help@jakarta.apache.org


DO NOT REPLY [Bug 39092] - htmlparser should be updated and isolated

Posted by bu...@apache.org.
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG�
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=39092>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND�
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=39092





------- Additional Comments From woolfel@yahoo.com  2006-03-24 02:20 -------
There is a significant downside to ask users to download HTMLParser from
sourceforge. Many users complain about this, so it needs to be documented
clearly. We've seen this with the Webservice sampler, which requires users
download external jars. I disagree with delete htmlparser in the src directory.
Htmlparser developers were kind enough to donate a snapshot under apache license
and I still find it valuable. Instead, we should make it configurable, or get
rid of JTidy and htmlparser all together. We currently have JTidy, regexp and
htmlparser. The original reason for using htmlparser is it's easier to use than
JTidy and not significantly slower than regexp.

my 2 cents on the issue. 

-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

---------------------------------------------------------------------
To unsubscribe, e-mail: jmeter-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: jmeter-dev-help@jakarta.apache.org


DO NOT REPLY [Bug 39092] - htmlparser should be updated and isolated

Posted by bu...@apache.org.
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG�
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=39092>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND�
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=39092





------- Additional Comments From hkimura@be.to  2006-03-25 13:48 -------
Created an attachment (id=17978)
 --> (http://issues.apache.org/bugzilla/attachment.cgi?id=17978&action=view)
A jmx file I used for testing the combination of BeanShell Sampler and
htmlparser1.6

Incidentally, I would like to show how valuable the combination of
htmlparser1.6 and BeanShell Sampler is.
The attached jmx file demonstrates it.
This jmx accesses www.apache.org and retrieve link-tags by this beanshell
codes:

parser = new Parser();
parser.setInputHTML(new String(ctx.getPreviousResult().getResponseData(),
"iso-8859-1"));
// pickup apache sites shown on the left.
aTags = parser.parse(new AndFilter(new TagNameFilter("td"), new
HasAttributeFilter("class", "navleft")))
  .extractAllNodesThatMatch(new LinkRegexFilter("http://.*/"), true);
# htmlparser's NodeFilters are very cool!

and after that, put them into variables:
for(i = 0; i < aTags.size() && i < 10; ++i) {
  href =  aTags.elementAt(i).getAttribute("href");
  server = href.substring("http://".length(), href.length() - 1);
  log.info("The server of found site is : " + server);
  vars.put("SITES_SERVER_" + (i + 1), server);
}

and ForEach Controller and parameterized HTTP Sampler calls
each retrieved URL.

The above is only an example, this process has unprecedented flexibility,
maintainability and easiness to correlate.
It hasn't been provided by LoadRunner/Rational or any other test tool!
I'm so grateful to the developers of htmlparser and BeanShell Sampler.
Really Exciting!

-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

---------------------------------------------------------------------
To unsubscribe, e-mail: jmeter-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: jmeter-dev-help@jakarta.apache.org


DO NOT REPLY [Bug 39092] - htmlparser should be updated and isolated

Posted by bu...@apache.org.
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG�
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=39092>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND�
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=39092





------- Additional Comments From hkimura@be.to  2006-04-03 15:31 -------
Goood news!
Mr. Derrick Oswald, the lead developer of htmlparser, generously permits 
JMeter project to re-distribute htmlparser.jar.

> You are welcome to use the HTML Parser, in either binary or source code 
form, to be included with JMeter.
> 
> Sincerely,
> 
> Derrick Oswald
> HTML Parser Lead Programmer

Thanks for advice, Peter.

-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

---------------------------------------------------------------------
To unsubscribe, e-mail: jmeter-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: jmeter-dev-help@jakarta.apache.org


DO NOT REPLY [Bug 39092] - htmlparser should be updated and isolated

Posted by bu...@apache.org.
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG�
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=39092>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND�
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=39092





------- Additional Comments From hkimura@be.to  2006-03-25 10:07 -------
Thanks for pointing out.

As you said, it seems to need only htmlparser.jar.
It worked in my environment, too.

If you can get round the compilation problem and are going to
add a new class instead of replacing it,
please discard my changing at HTMLParser.java and remove
the isValid() function from new HtmlParserHTMLParser.java
as well as give it a new name, like HtmlParserHTMLParser16(?) .

I apologize if you have already done or planed it (most likely so...)


(In reply to comment #17)
> That works better, though I had to change:
> 
> baseUrl.url = new URL(baseUrl.url, baseHref.getBaseUrl() + "/");
> 
> to
> 
> baseUrl.url = new URL(baseUrl.url, baseHref.getBaseUrl());
> 
> to avoid getting  // in URLs.
> 
> ==
> 
> By the way, I found I only needed htmlparser.jar for compiling and running.



-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

---------------------------------------------------------------------
To unsubscribe, e-mail: jmeter-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: jmeter-dev-help@jakarta.apache.org


DO NOT REPLY [Bug 39092] - htmlparser should be updated and isolated

Posted by bu...@apache.org.
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG�
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=39092>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND�
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=39092


sebb@apache.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |RESOLVED
         Resolution|                            |FIXED




------- Additional Comments From sebb@apache.org  2006-05-26 00:11 -------
I've added a new parser class:  HtmlParserHTMLParser16 to the 2.1 branch

Just set the parser property accordingly, and replace the htmlparser jar with
version 1.6 (or later)

-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

---------------------------------------------------------------------
To unsubscribe, e-mail: jmeter-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: jmeter-dev-help@jakarta.apache.org


DO NOT REPLY [Bug 39092] - htmlparser should be updated and isolated

Posted by bu...@apache.org.
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG�
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=39092>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND�
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=39092





------- Additional Comments From hkimura@be.to  2006-03-24 03:20 -------
Created an attachment (id=17965)
 --> (http://issues.apache.org/bugzilla/attachment.cgi?id=17965&action=view)
a whole source code of new HTMLParser


-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

---------------------------------------------------------------------
To unsubscribe, e-mail: jmeter-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: jmeter-dev-help@jakarta.apache.org


DO NOT REPLY [Bug 39092] - htmlparser should be updated and isolated

Posted by bu...@apache.org.
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG�
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=39092>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND�
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=39092





------- Additional Comments From sebb@apache.org  2006-03-25 11:39 -------
Created an attachment (id=17977)
 --> (http://issues.apache.org/bugzilla/attachment.cgi?id=17977&action=view)
htmlparser source and class

The attached jar contains my version of the source, and the compiled class.

I've kept the same name as the existing class.
To use it:
Replace the jmeter htmlparser.jar with the SF one.
Put the htmlparserpaser.jar in the lib directory.
You may also need to delete the htmlparserhtmlparser.class file from the Jmeter
http jar.
I've not decided how best to build/test the new class automatically yet - at
present I'm using a separate Eclipse project, which is not ideal.
But in the meantime, please try it, and try to break it...

-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

---------------------------------------------------------------------
To unsubscribe, e-mail: jmeter-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: jmeter-dev-help@jakarta.apache.org


DO NOT REPLY [Bug 39092] - htmlparser should be updated and isolated

Posted by bu...@apache.org.
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG�
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=39092>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND�
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=39092





------- Additional Comments From hkimura@be.to  2006-03-25 13:30 -------
> Replace the jmeter htmlparser.jar with the SF one.
> Put the htmlparserpaser.jar in the lib directory.
> You may also need to delete the htmlparserhtmlparser.class file from the Jmeter
> http jar.
All right.
It worked well, though I need to delete the htmlparserhtmlparser.class as you said.

And, I found one more bug of HtmlParserHTMLParser, in Line 163
   if (tag.getAttribute("rel").equalsIgnoreCase("stylesheet")) {
should be
   if (tag.getAttribute("rel") != null &&
tag.getAttribute("rel").equalsIgnoreCase("stylesheet")) {
NPE happens during retrieving http://db.apache.org/ or other sites which have
"</link>" (xhtml).

> I've not decided how best to build/test the new class automatically yet - at
> present I'm using a separate Eclipse project, which is not ideal.
> But in the meantime, please try it, and try to break it...
I agree with you. It's better to think about how to provide this
, though I feel like asking the htmlparser team again at some day.

Thanks, sebb.

-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

---------------------------------------------------------------------
To unsubscribe, e-mail: jmeter-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: jmeter-dev-help@jakarta.apache.org


DO NOT REPLY [Bug 39092] - htmlparser should be updated and isolated

Posted by bu...@apache.org.
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG�
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=39092>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND�
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=39092





------- Additional Comments From sebb@apache.org  2006-03-24 03:12 -------
I think I can get round the compilation problem.

However, the problem I have at the moment is that the patch does not work for me.

Please can you attach the full new parser file?

-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

---------------------------------------------------------------------
To unsubscribe, e-mail: jmeter-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: jmeter-dev-help@jakarta.apache.org


DO NOT REPLY [Bug 39092] - htmlparser should be updated and isolated

Posted by bu...@apache.org.
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG�
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=39092>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND�
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=39092





------- Additional Comments From sebb@apache.org  2006-03-24 02:49 -------
The following property is used to define the parser interface class:

htmlParser.className

so one should be able to create a new class to use the new API - instead of
replacing the existing class as currently proposed.

If a user wants the new parser, then they just download the new jars, and update
the parser property.

OK?

-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

---------------------------------------------------------------------
To unsubscribe, e-mail: jmeter-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: jmeter-dev-help@jakarta.apache.org


DO NOT REPLY [Bug 39092] - htmlparser should be updated and isolated

Posted by bu...@apache.org.
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG�
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=39092>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND�
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=39092


hkimura@be.to changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #17964|0                           |1
        is obsolete|                            |




------- Additional Comments From hkimura@be.to  2006-03-24 22:43 -------
Created an attachment (id=17975)
 --> (http://issues.apache.org/bugzilla/attachment.cgi?id=17975&action=view)
a modified source code of new HTMLParserHTMLParser

sorry, what a silly mistake..

-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

---------------------------------------------------------------------
To unsubscribe, e-mail: jmeter-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: jmeter-dev-help@jakarta.apache.org


DO NOT REPLY [Bug 39092] - htmlparser should be updated and isolated

Posted by bu...@apache.org.
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG�
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=39092>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND�
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=39092


sebb@apache.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |NEEDINFO




------- Additional Comments From sebb@apache.org  2006-03-24 18:16 -------
There's a problem with the parsing code - it does not seem to handle BASEREF
tags properly. They are detected, but the new base is not saved for subsequent tags.


-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

---------------------------------------------------------------------
To unsubscribe, e-mail: jmeter-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: jmeter-dev-help@jakarta.apache.org


DO NOT REPLY [Bug 39092] - htmlparser should be updated and isolated

Posted by bu...@apache.org.
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG�
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=39092>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND�
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=39092





------- Additional Comments From hkimura@be.to  2006-03-24 03:08 -------
One thing is that,
compiling current HtmlParserHTMLParser needs htmlparser 1.3
and that of new HtmlParserHTMLParser needs 1.6.

Unfortunately, it's impossible to make a new HtmlParserHTMLParser which can be
compiled with htmlparser 1.6 as well as 1.3 .
They are totally incompatible. Only one of them can be in the JMeter source code.

-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

---------------------------------------------------------------------
To unsubscribe, e-mail: jmeter-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: jmeter-dev-help@jakarta.apache.org


DO NOT REPLY [Bug 39092] - htmlparser should be updated and isolated

Posted by bu...@apache.org.
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG�
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=39092>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND�
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=39092





------- Additional Comments From hkimura@be.to  2006-03-24 02:47 -------
Thanks for concerning, peter.

Exactly, downloading manually is troublesome.
But, it still works without htmlparser, apart from the performance down which
comes from the low performance of RegexHTMLParser you mentioned.

Who have to download htmlparser manually are only those who put "retrieve all"
in the HTTP Sampler on and also have to care about the performance of HTTP Sampler.
In most case, I think, users don't have to do anything more than now.

But, anyway, the benefit of using donated codes still exists as you say.
Then... how about to ask htmlparser developers team again?
Is it too intrusive?

(In reply to comment #5)
> There is a significant downside to ask users to download HTMLParser from
> sourceforge. Many users complain about this, so it needs to be documented
> clearly. We've seen this with the Webservice sampler, which requires users
> download external jars. I disagree with delete htmlparser in the src directory.
> Htmlparser developers were kind enough to donate a snapshot under apache license
> and I still find it valuable. Instead, we should make it configurable, or get
> rid of JTidy and htmlparser all together. We currently have JTidy, regexp and
> htmlparser. The original reason for using htmlparser is it's easier to use than
> JTidy and not significantly slower than regexp.
> 
> my 2 cents on the issue. 



-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

---------------------------------------------------------------------
To unsubscribe, e-mail: jmeter-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: jmeter-dev-help@jakarta.apache.org


DO NOT REPLY [Bug 39092] - htmlparser should be updated and isolated

Posted by bu...@apache.org.
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG�
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=39092>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND�
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=39092





------- Additional Comments From woolfel@yahoo.com  2006-03-24 02:53 -------
As usual, you have great ideas sebb. that sounds like a good solution to me.
peter

-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

---------------------------------------------------------------------
To unsubscribe, e-mail: jmeter-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: jmeter-dev-help@jakarta.apache.org