You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@maven.apache.org by "Lukas Theussl (JIRA)" <ji...@codehaus.org> on 2011/09/16 10:24:17 UTC
[jira] Commented: (DOXIA-441) HTML tags produce undefined behavior on the TWiki parser

    [ https://jira.codehaus.org/browse/DOXIA-441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=279145#comment-279145 ] 

Lukas Theussl commented on DOXIA-441:
-------------------------------------

I have added a test case to demonstrate the bug: http://svn.apache.org/viewvc?rev=1171439&view=rev

I am not maintaining the twiki module but if someone attaches a patch that fixes the test, i will certainly apply it.

> HTML tags produce undefined behavior on the TWiki parser
> --------------------------------------------------------
>
>                 Key: DOXIA-441
>                 URL: https://jira.codehaus.org/browse/DOXIA-441
>             Project: Maven Doxia
>          Issue Type: Bug
>          Components: Module - Twiki
>    Affects Versions: 1.1.4
>         Environment: RHEL 5.5, java 1.6.0_20
>            Reporter: Rodrigo Tobar
>         Attachments: TWikiParserTest.java
>
>
> I'm using the TWiki parser in conjunction with a sink to format some twiki text. When putting some html tags in the code, the parser produces invalid output. I found this bug while working with a home-brewed sink, but later I tried with other sinks and it was also the case, which pointed out that the fault is actually in the parser. Actually the test case I'm attaching is using a XhtmlBaseSink sink.
> The fault seems to be in org.apache.maven.doxia.module.twiki.parser.TextParser. I see one of two possibilities (but I don't have the time to produce a patch, and I prefer just to explain my findings):
>  * Fix the HTML_TAG_PATTERN pattern, since it is detecting, in the example, the whole " and a bit of <font color=\"red\">red</font>" string, instead of just "<font color=\"red\">red</font>"
>  * If that's not possible, then the pattern compiled in line 117/118 should be changed to take into account the content before the HTML tag, so it would be "(.+)?(\\<" + tag + ".*\\>)(.*)?(\\<\\/" + tag + "\\>)(.*)?" (the difference is the initial "(.+)?"). The logic with the group numbers should be changed too
>  * Other solution is to take into account the restul of xhtmlMatcher.start(1) in TextParser#parseXHTML:331, so it realizes that there is normal text before the actual tag.
> Please point out if this is really a bug in the TWiki parser, or if I'm simply doing something wrong. I couldn't find any reference in the mailing lists or whatsoever, and I'm inclined to see this as a bug; therefore, I'm opening this ticket.
> Cheers

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira