You are viewing a plain text version of this content. The canonical link for it is here.

Posted to regexp-user@jakarta.apache.org by Robert Sösemann <rs...@gmx.de> on 2004/11/23 10:17:04 UTC

Regexp instead of XSLT

I need to process HTML which is not wellformed and tools like Tidy *cannot*
make wellformed. I decided to apply some regexps to fulfill this task.

My structure is HTML with some extra tags, that I need to extract e.g.:
...
<table border="0" cellspacing="0" cellpadding="0" width="100%">
  <tr>
    <my-contenttype name="foo">
    <td>
      <table border="0" cellspacing="0" cellpadding="0" width="100%">
        <tr>
          <my-attribute name="bar">
          <td class="headline_01">
             FooBar
          </td>
          </my-attribute>    
        </tr>
      </table>
    </td>
    </my-contenttype>
  </tr>
<table>
...

I need to extract every opening and closing my tag and also extract all text
between my-attribute tags.
The result of the regexp should be:

<my-contenttype name="foo">
  <my-attribute name="bar">
    FooBar
  </my-attribute>
</my-contenttype>

Can anybody help.

-- 
Geschenkt: 3 Monate GMX ProMail + 3 Top-Spielfilme auf DVD
++ Jetzt kostenlos testen http://www.gmx.net/de/go/mail ++

---------------------------------------------------------------------
To unsubscribe, e-mail: regexp-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: regexp-user-help@jakarta.apache.org