You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-users@xalan.apache.org by Peter Hollas <pe...@gmail.com> on 2006/11/29 12:50:23 UTC

XHTML link tag stripping

Hi everyone,

Please could someone provide an example stylesheet of how to strip <a> link
tags out of a source XHTML document whilst retaining the remaining node text
from within the body. Preferably the output should have normalised
whitespace and a space seperating each extracted piece of text. eg.

Source:

<html>
<head>
<title>Not wanted</title>
</head>
<body>
<a>Not wanted</a>
<div class="1">This text is wanted <a href="#">Not wanted</a> and so is
this</div>
<p>Wanted</p>
</body>
</html>


Output:

<htmltext>This text is wanted and so is this Wanted</htmltext>

I'm sure that the solution is incredibly simple, but after days of trying I
keep hitting a brick wall.

Many thanks, Peter.