You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oro-user@jakarta.apache.org by "Daniel F. Savarese" <df...@savarese.org> on 2001/09/29 09:30:35 UTC

Re: How to write regular express to remove the html tag content?

>I select the awk compiler option of oro's demo.html , and write the pattern
>"<head>\w+</head>"
>but nothing result return!! What's error happen?

\w only matches alphanumeric input plus underscore (it's actually not
standard awk, but gawk and other awk offshoots implement it so we do too).
Your input contains characters other than those matched by \w in the
given place, so you don't get any matches.  As a side note, regular
expressions are generally not the best tool for processing HTML and
you will probably be better off using DOM.  However, they can be
useful for quick and dirty hacks (as long as you come up with the
right expressions :)

daniel