You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@ant.apache.org by gregsmit <gr...@us.ibm.com> on 2008/05/14 23:00:56 UTC
Ant task walk html and find broken links
Hi,
I thought something like this might already exist, but I haven't been able
to find anything yet.
Does anyone know of an Ant task that I could use to walk through a website
(that I built with ant) to confirm that there are no broken links? I found
one really old project on sourceforge, but it looks pretty abandoned.
Thanks for any info,
Greg
--
View this message in context: http://www.nabble.com/Ant-task-walk-html-and-find-broken-links-tp17240744p17240744.html
Sent from the Ant - Users mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@ant.apache.org
For additional commands, e-mail: user-help@ant.apache.org
Re: Ant task walk html and find broken links
Posted by Dominique Devienne <dd...@gmail.com>.
On Wed, May 14, 2008 at 4:00 PM, gregsmit <gr...@us.ibm.com> wrote:
> Does anyone know of an Ant task that I could use to walk through a website
> (that I built with ant) to confirm that there are no broken links? I found
> one really old project on sourceforge, but it looks pretty abandoned.
I wrote one a long time ago based on NekoHTML to do the HTML parsing,
because all the ones I could find were online only, and thus checked
public internet links only. I only made mine verify the link fragments
(#id) could be found in the link target (I was checking documentation
cross-references).
Unless Canoo, it doesn't attempt to process javascript. Mine was
simple minded and looking only at <a href>, <link href>, and <img
src>, and at filters to avoid checking links based on patterns (to
restrict checking local relative links for example, and skip http:
links).
This code is old, and hasn't been compiled or run in ages, but
apparently I unit tested it, so might still be useful ;-) I'm happy to
share the code (although it uses a few utility classes, so not easy to
extract the relevant pieces).
That's assuming Canoo is not a good fit here. My stuff probably pales
in comparison, but I'm throwing it out there just in case it might be
useful.
--DD
/**
* Checks an HTML page for bad links.
* <p>
* Uses <a href="http://www.apache.org/~andyc/neko/doc/html/">NekoHTML</a>,
* but could also use <a href="http://jtidy.sourceforge.net/">JTidy</a> I guess.
* <p>
* Current limitations:
* <ul>
* <li>Cannot indicate line/column of the bad link</li>
* <li>Does not support re-baseing of document</li>
* <li>Does not check URL in stylesheets</li>
* <li>Slow!?</li>
* </ul>
*
* @version May 2004
*/
public class HtmlLinkChecker extends ConditionalAspect.AbstractTask { ... }
<?xml version="1.0"?>
<project name="HtmlLinkCheckerTest" default="tearDown"
xmlns:bm="antlib:buildmagic">
<target name="setUp">
<property name="tmp" location="${basedir}/${ant.project.name}.tmp" />
<mkdir dir="${tmp}" />
</target>
<target name="tearDown">
<delete dir="${tmp}" />
</target>
<!-- Creates a few dummy HTML files, which by default have no bad links.
Just override one of the property to force some kind of bad link. -->
<target name="setUpFiles" depends="setUp">
<property name="google.link" value="http://www.google.com" />
<property name="logo.file" value="logo.gif" />
<property name="bullet.file" value="bullet.gif" />
<property name="style.file" value="style.css" />
<property name="book.file" value="book.html" />
<property name="chapter1.file" value="chapter1.html" />
<property name="section1.id" value="section1" />
<property name="sectionA.id" value="sectionA" />
<property name="coucou.id" value="coucou" />
<echo file="${tmp}/logo.gif">I am a logo!</echo>
<echo file="${tmp}/bullet.gif">I am a bullet!</echo>
<echo file="${tmp}/style.css">
p { color: #000000 }
ul { list-style: url(${bullet.file}) }
</echo>
<echo file="${tmp}/book.html"><![CDATA[
<html>
<body>
<a href="${google.link}">Search:</a>
<p id="coucou">coucou</p>
<a href="${chapter1.file}">Chapter 1</a>
<a href="${chapter1.file}#${section1.id}">Section 1</a>
<a href="${chapter1.file}#section2">Section 1</a>
<a href="chapter2.html">Chapter 2</a>
</body>
</html>
]]></echo>
<echo file="${tmp}/chapter1.html"><![CDATA[
<html>
<head>
<link href="${style.file}" rel="stylesheet">
</head>
<body>
<h2 id="section1">Section #1</h2>
<h2 id="section2">Section #2</h2>
<a href="book.html#${coucou.id}">Book Index</a>
</body>
</html>
]]></echo>
<echo file="${tmp}/chapter2.html"><![CDATA[
<html>
<head>
<link href="${style.file}" rel="stylesheet">
</head>
<body>
<img src="${logo.file}">
See <a href="#${sectionA.id}">Section A</a>
<h2 id="sectionA">Section A</h2>
<h2 id="sectionB">Section B</h2>
<a href="${book.file}">Book Index</a>
</body>
</html>
]]></echo>
</target>
<target name="test-generic" depends="setUpFiles">
<bm:checklinks verbose="true">
<bm:fileset dir="${tmp}" includes="*.html" />
</bm:checklinks>
</target>
<target name="test-patterns" depends="setUpFiles">
<bm:checklinks verbose="false">
<bm:fileset dir="${tmp}" includes="*.html" />
<bm:linkpatterns>
<bm:include regexp=".*/images/.*" ifTrue="${+imgs}" />
<bm:exclude prefix="chapterOne.html" ifTrue="${-chap1}" />
<bm:exclude regexp=".*#.*" ifTrue="${-frag}" />
<bm:exclude prefix="http:" ifTrue="${-http}" />
</bm:linkpatterns>
</bm:checklinks>
</target>
</project>
public class HtmlLinkCheckerTest
extends BuildFileTestCase {
/**
* Tests all the links are OK.
* Note that it doesn't tell us if some links are not checked...
* Note also that it requires an internet connection to go to Google.
*/
public void testGoodLinks() {
executeTarget("test-generic");
}
public void testBadExternalHttpLink() {
setProperty("google.link", "http://zzz.google.com");
expectSpecificBuildException("test-generic", "bad external http link",
"1 bad link(s)");
assertBadLink("http://zzz.google.com");
}
public void testBadInternalFileLink() {
setProperty("google.link", "book.html");
setProperty("chapter1.file", "chapterOne.html");
expectSpecificBuildException("test-generic", "bad internal file link",
"3 bad link(s)");
assertBadLink("chapterOne.html");
assertBadLink("chapterOne.html#section1");
assertBadLink("chapterOne.html#section2");
}
public void testBadInternalFileFragment() {
setProperty("google.link", "book.html");
setProperty("section1.id", "sectionOne");
expectSpecificBuildException("test-generic", "bad internal file frag",
"1 bad link(s)");
assertBadLink("chapter1.html#sectionOne");
}
public void testBadSelfFragment() {
setProperty("google.link", "book.html");
setProperty("sectionA.id", "sectionABC");
expectSpecificBuildException("test-generic", "bad self frag",
"1 bad link(s)");
assertBadLink("#sectionABC");
}
public void testBadHeadLink() {
setProperty("google.link", "book.html");
setProperty("style.file", "stylesheet.CSS");
expectSpecificBuildException("test-generic", "bad head link",
"1 bad link(s)");
assertBadLink("stylesheet.CSS");
}
public void testBadUrlInCss() {
setProperty("google.link", "book.html");
setProperty("bullet.file", "square.gif");
try {
expectSpecificBuildException("test-generic", "bad url in css",
"1 bad link(s)");
assertBadLink("square.gif");
}
catch (junit.framework.AssertionFailedError e) {
// TODO: implement CSS link checks
}
}
public void testBadImage() {
setProperty("google.link", "book.html");
setProperty("logo.file", "logo.jpg");
expectSpecificBuildException("test-generic", "bad image",
"1 bad link(s)");
assertBadLink("logo.jpg");
//System.out.println(getLog());
//System.out.println(getOutput());
//System.out.println(getFullLog());
//System.err.println(getError());
}
public void testIgnoreBadInternalFileLink() {
setProperty("google.link", "book.html");
setProperty("chapter1.file", "chapterOne.html");
setProperty("-chap1", "true");
executeTarget("test-patterns");
}
public void testIgnoreBadExternalHttpLink() {
setProperty("-http", "true");
setProperty("google.link", "http://zzz.google.com");
executeTarget("test-patterns");
}
public void testIgnoreBadFragments() {
setProperty("-frag", "true");
setProperty("google.link", "book.html");
setProperty("section1.id", "sectionOne");
setProperty("sectionA.id", "sectionABC");
executeTarget("test-patterns");
}
public void testCheckImagesOnly() {
setProperty("+imgs", "true");
setProperty("google.link", "book.html");
// Creates a few broken links, to be ignored (since not checked)
setProperty("section1.id", "sectionOne");
setProperty("sectionA.id", "sectionABC");
setProperty("chapter1.file", "chapterOne.html");
executeTarget("test-patterns");
}
private void setProperty(String name, String value) {
getProject().setNewProperty(name, value);
}
private void assertBadLink(String link) {
assertTrue(getLog().indexOf(": " + link + ":") > -1);
}
} // END class HtmlLinkCheckerTest
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@ant.apache.org
For additional commands, e-mail: user-help@ant.apache.org
RE: Ant task walk html and find broken links
Posted by gregsmit <gr...@us.ibm.com>.
Hi guys,
Just looking at the front page of the Canoo website -- this looks like the
right tool to use -- I'll be checking it out.
Thanks!
Greg
ruel loehr wrote:
>
> This is the correct answer. You are talking about doing functional
> testing. This is the perfect tool for doing so.
> ________________________________________
> From: Gilbert Rebhan [ant@schillbaer.de]
> Sent: Wednesday, May 14, 2008 4:15 PM
> To: Ant Users List
> Subject: Re: Ant task walk html and find broken links
>
> Scot P. Floess schrieb:
>> Interesting question...
>>
>> I know I need something like this too...and actually got bitten by
>> generating some bad HTML.
>>
>> I considered using Selenium to test this...but that isn't an Ant task :(
>
> what about Canoo ?
> http://webtest.canoo.com/webtest/manual/WebTestHome.html
>
> didn't try it yet, but sounds promising
>
> Regards, Gilbert
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@ant.apache.org
> For additional commands, e-mail: user-help@ant.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@ant.apache.org
> For additional commands, e-mail: user-help@ant.apache.org
>
>
>
--
View this message in context: http://www.nabble.com/Ant-task-walk-html-and-find-broken-links-tp17240744p17242021.html
Sent from the Ant - Users mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@ant.apache.org
For additional commands, e-mail: user-help@ant.apache.org
RE: Ant task walk html and find broken links
Posted by "Loehr, Ruel" <rl...@pointserve.com>.
This is the correct answer. You are talking about doing functional testing. This is the perfect tool for doing so.
________________________________________
From: Gilbert Rebhan [ant@schillbaer.de]
Sent: Wednesday, May 14, 2008 4:15 PM
To: Ant Users List
Subject: Re: Ant task walk html and find broken links
Scot P. Floess schrieb:
> Interesting question...
>
> I know I need something like this too...and actually got bitten by
> generating some bad HTML.
>
> I considered using Selenium to test this...but that isn't an Ant task :(
what about Canoo ?
http://webtest.canoo.com/webtest/manual/WebTestHome.html
didn't try it yet, but sounds promising
Regards, Gilbert
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@ant.apache.org
For additional commands, e-mail: user-help@ant.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@ant.apache.org
For additional commands, e-mail: user-help@ant.apache.org
Re: Ant task walk html and find broken links
Posted by Gilbert Rebhan <an...@schillbaer.de>.
Scot P. Floess schrieb:
> Interesting question...
>
> I know I need something like this too...and actually got bitten by
> generating some bad HTML.
>
> I considered using Selenium to test this...but that isn't an Ant task :(
what about Canoo ?
http://webtest.canoo.com/webtest/manual/WebTestHome.html
didn't try it yet, but sounds promising
Regards, Gilbert
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@ant.apache.org
For additional commands, e-mail: user-help@ant.apache.org
Re: Ant task walk html and find broken links
Posted by "Scot P. Floess" <fl...@mindspring.com>.
Interesting question...
I know I need something like this too...and actually got bitten by
generating some bad HTML.
I considered using Selenium to test this...but that isn't an Ant task :(
gregsmit wrote:
> Hi,
>
> I thought something like this might already exist, but I haven't been able
> to find anything yet.
>
> Does anyone know of an Ant task that I could use to walk through a website
> (that I built with ant) to confirm that there are no broken links? I found
> one really old project on sourceforge, but it looks pretty abandoned.
>
> Thanks for any info,
> Greg
>
>
--
Scot P. Floess
27 Lake Royale
Louisburg, NC 27549
252-478-8087 (Home)
919-754-4592 (Work)
Chief Architect JPlate http://sourceforge.net/projects/jplate
Chief Architect JavaPIM http://sourceforge.net/projects/javapim
Architect Keros http://sourceforge.net/projects/keros
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@ant.apache.org
For additional commands, e-mail: user-help@ant.apache.org
Re: Ant task walk html and find broken links
Posted by Steve Loughran <st...@apache.org>.
gregsmit wrote:
>
> Hi,
>
> I thought something like this might already exist, but I haven't been able
> to find anything yet.
>
> Does anyone know of an Ant task that I could use to walk through a website
> (that I built with ant) to confirm that there are no broken links? I found
> one really old project on sourceforge, but it looks pretty abandoned.
I use httpunit for this, as it audits the web site during your test
runs. With Java6 it can run javascript code too.
--
Steve Loughran http://www.1060.org/blogxter/publish/5
Author: Ant in Action http://antbook.org/
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@ant.apache.org
For additional commands, e-mail: user-help@ant.apache.org