You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@maven.apache.org by "Andrius Velykis (JIRA)" <ji...@codehaus.org> on 2012/10/27 23:41:13 UTC

[jira] (DOXIA-480) XhtmlBaseParser ignores XHTML default entities

Andrius Velykis created DOXIA-480:
-------------------------------------

Summary: XhtmlBaseParser ignores XHTML default entities
Key: DOXIA-480
URL: https://jira.codehaus.org/browse/DOXIA-480
Project: Maven Doxia
Issue Type: Bug
Components: Core, Module - Xhtml
Affects Versions: 1.4
Reporter: Andrius Velykis
Attachments: doxia-core-XhtmlBaseParser.patch, doxia-xhtml-entities-bug.zip

XHTML defines a number of default entities that can appear in valid XHTML files (http://www.w3.org/TR/xhtml1/#h-A2), such as left/right quotes: &ldquo;, &rsquo;, and many others.

XhtmlBaseParser, however, ignores XHTML default entities appearing in the source code. This is because it delegates the parsing to AbstractXmlParser, which uses vanilla MXParser to parse. MXParser only recognises default XML entities.

Because the HTML entities are not resolved by the XML parser, and thus by the XHTML parser, they are not rendered by the XHTML module. I have attached a sample project for Maven site that uses XHTML module. The source file has double/single quotes, however the output file does not.

This also affects other parsers that extend XhtmlParser, e.g. MarkdownParser (see DOXIA-473 for a reported bug). This is because Pegdown library, used to parse Markdown, generates &ldquo; for quotes and other entities.

I have attached a patch that fixes this problem. It exposes the XmlPullParser (MXParser) for configuration before parsing, so that extending classes could define default entities. Then XhtmlBaseParser adds default XHTML entities to the parser. This patch will also fix DOXIA-473, because MarkdownParser extends XhtmlParser.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://jira.codehaus.org/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] (DOXIA-480) XhtmlBaseParser ignores XHTML default entities

Posted by "Herve Boutemy (JIRA)" <ji...@codehaus.org>.

    [ https://jira.codehaus.org/browse/DOXIA-480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=317635#comment-317635 ] 

Herve Boutemy commented on DOXIA-480:
-------------------------------------

would it be possible to add a unit test, to show the expected behaviour?

XhtmlBaseParserTest seems to be a good place
                
> XhtmlBaseParser ignores XHTML default entities
> ----------------------------------------------
>
>                 Key: DOXIA-480
>                 URL: https://jira.codehaus.org/browse/DOXIA-480
>             Project: Maven Doxia
>          Issue Type: Bug
>          Components: Core, Module - Xhtml
>    Affects Versions: 1.4
>            Reporter: Andrius Velykis
>         Attachments: doxia-core-XhtmlBaseParser.patch, doxia-xhtml-entities-bug.zip
>
>
> XHTML defines a number of default entities that can appear in valid XHTML files (http://www.w3.org/TR/xhtml1/#h-A2), such as left/right quotes: &ldquo;, &rsquo;, and many others.
> XhtmlBaseParser, however, ignores XHTML default entities appearing in the source code. This is because it delegates the parsing to AbstractXmlParser, which uses vanilla MXParser to parse. MXParser only recognises default XML entities.
> Because the HTML entities are not resolved by the XML parser, and thus by the XHTML parser, they are not rendered by the XHTML module. I have attached a sample project for Maven site that uses XHTML module. The source file has double/single quotes, however the output file does not.
> This also affects other parsers that extend XhtmlParser, e.g. MarkdownParser (see DOXIA-473 for a reported bug). This is because Pegdown library, used to parse Markdown, generates &ldquo; for quotes and other entities.
> I have attached a patch that fixes this problem. It exposes the XmlPullParser (MXParser) for configuration before parsing, so that extending classes could define default entities. Then XhtmlBaseParser adds default XHTML entities to the parser. This patch will also fix DOXIA-473, because MarkdownParser extends XhtmlParser.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://jira.codehaus.org/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] (DOXIA-480) XhtmlBaseParser ignores XHTML default entities

Posted by "Andrius Velykis (JIRA)" <ji...@codehaus.org>.

    [ https://jira.codehaus.org/browse/DOXIA-480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=312467#comment-312467 ] 

Andrius Velykis commented on DOXIA-480:
---------------------------------------

Quote entities in the bug description got rendered in HTML, should have been
bq. ".. such as left/right quotes: {{&amp;ldquo;}}, {{&amp;rsquo;}}, and many others."
                
> XhtmlBaseParser ignores XHTML default entities
> ----------------------------------------------
>
>                 Key: DOXIA-480
>                 URL: https://jira.codehaus.org/browse/DOXIA-480
>             Project: Maven Doxia
>          Issue Type: Bug
>          Components: Core, Module - Xhtml
>    Affects Versions: 1.4
>            Reporter: Andrius Velykis
>         Attachments: doxia-core-XhtmlBaseParser.patch, doxia-xhtml-entities-bug.zip
>
>
> XHTML defines a number of default entities that can appear in valid XHTML files (http://www.w3.org/TR/xhtml1/#h-A2), such as left/right quotes: &ldquo;, &rsquo;, and many others.
> XhtmlBaseParser, however, ignores XHTML default entities appearing in the source code. This is because it delegates the parsing to AbstractXmlParser, which uses vanilla MXParser to parse. MXParser only recognises default XML entities.
> Because the HTML entities are not resolved by the XML parser, and thus by the XHTML parser, they are not rendered by the XHTML module. I have attached a sample project for Maven site that uses XHTML module. The source file has double/single quotes, however the output file does not.
> This also affects other parsers that extend XhtmlParser, e.g. MarkdownParser (see DOXIA-473 for a reported bug). This is because Pegdown library, used to parse Markdown, generates &ldquo; for quotes and other entities.
> I have attached a patch that fixes this problem. It exposes the XmlPullParser (MXParser) for configuration before parsing, so that extending classes could define default entities. Then XhtmlBaseParser adds default XHTML entities to the parser. This patch will also fix DOXIA-473, because MarkdownParser extends XhtmlParser.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://jira.codehaus.org/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] (DOXIA-480) XhtmlBaseParser ignores XHTML default entities

Posted by "Olivier Lamy (JIRA)" <ji...@codehaus.org>.

     [ https://jira.codehaus.org/browse/DOXIA-480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Olivier Lamy closed DOXIA-480.
------------------------------

       Resolution: Fixed
    Fix Version/s: 1.4

fixed http://svn.apache.org/r1436646
Thanks !
                
> XhtmlBaseParser ignores XHTML default entities
> ----------------------------------------------
>
>                 Key: DOXIA-480
>                 URL: https://jira.codehaus.org/browse/DOXIA-480
>             Project: Maven Doxia
>          Issue Type: Bug
>          Components: Core, Module - Xhtml
>    Affects Versions: 1.4
>            Reporter: Andrius Velykis
>            Assignee: Olivier Lamy
>             Fix For: 1.4
>
>         Attachments: doxia-core-XhtmlBaseParser.patch, doxia-core-XhtmlBaseParser.patch, doxia-xhtml-entities-bug.zip
>
>
> XHTML defines a number of default entities that can appear in valid XHTML files (http://www.w3.org/TR/xhtml1/#h-A2), such as left/right quotes: &ldquo;, &rsquo;, and many others.
> XhtmlBaseParser, however, ignores XHTML default entities appearing in the source code. This is because it delegates the parsing to AbstractXmlParser, which uses vanilla MXParser to parse. MXParser only recognises default XML entities.
> Because the HTML entities are not resolved by the XML parser, and thus by the XHTML parser, they are not rendered by the XHTML module. I have attached a sample project for Maven site that uses XHTML module. The source file has double/single quotes, however the output file does not.
> This also affects other parsers that extend XhtmlParser, e.g. MarkdownParser (see DOXIA-473 for a reported bug). This is because Pegdown library, used to parse Markdown, generates &ldquo; for quotes and other entities.
> I have attached a patch that fixes this problem. It exposes the XmlPullParser (MXParser) for configuration before parsing, so that extending classes could define default entities. Then XhtmlBaseParser adds default XHTML entities to the parser. This patch will also fix DOXIA-473, because MarkdownParser extends XhtmlParser.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://jira.codehaus.org/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] (DOXIA-480) XhtmlBaseParser ignores XHTML default entities

Posted by "Herve Boutemy (JIRA)" <ji...@codehaus.org>.

    [ https://jira.codehaus.org/browse/DOXIA-480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=317729#comment-317729 ] 

Herve Boutemy commented on DOXIA-480:
-------------------------------------

Olivier was faster than me: thank you Andrius for this good work
                
> XhtmlBaseParser ignores XHTML default entities
> ----------------------------------------------
>
>                 Key: DOXIA-480
>                 URL: https://jira.codehaus.org/browse/DOXIA-480
>             Project: Maven Doxia
>          Issue Type: Bug
>          Components: Core, Module - Xhtml
>    Affects Versions: 1.4
>            Reporter: Andrius Velykis
>            Assignee: Olivier Lamy
>             Fix For: 1.4
>
>         Attachments: doxia-core-XhtmlBaseParser.patch, doxia-core-XhtmlBaseParser.patch, doxia-xhtml-entities-bug.zip
>
>
> XHTML defines a number of default entities that can appear in valid XHTML files (http://www.w3.org/TR/xhtml1/#h-A2), such as left/right quotes: &ldquo;, &rsquo;, and many others.
> XhtmlBaseParser, however, ignores XHTML default entities appearing in the source code. This is because it delegates the parsing to AbstractXmlParser, which uses vanilla MXParser to parse. MXParser only recognises default XML entities.
> Because the HTML entities are not resolved by the XML parser, and thus by the XHTML parser, they are not rendered by the XHTML module. I have attached a sample project for Maven site that uses XHTML module. The source file has double/single quotes, however the output file does not.
> This also affects other parsers that extend XhtmlParser, e.g. MarkdownParser (see DOXIA-473 for a reported bug). This is because Pegdown library, used to parse Markdown, generates &ldquo; for quotes and other entities.
> I have attached a patch that fixes this problem. It exposes the XmlPullParser (MXParser) for configuration before parsing, so that extending classes could define default entities. Then XhtmlBaseParser adds default XHTML entities to the parser. This patch will also fix DOXIA-473, because MarkdownParser extends XhtmlParser.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://jira.codehaus.org/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] (DOXIA-480) XhtmlBaseParser ignores XHTML default entities

Posted by "Olivier Lamy (JIRA)" <ji...@codehaus.org>.

     [ https://jira.codehaus.org/browse/DOXIA-480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Olivier Lamy reassigned DOXIA-480:
----------------------------------

    Assignee: Olivier Lamy
    
> XhtmlBaseParser ignores XHTML default entities
> ----------------------------------------------
>
>                 Key: DOXIA-480
>                 URL: https://jira.codehaus.org/browse/DOXIA-480
>             Project: Maven Doxia
>          Issue Type: Bug
>          Components: Core, Module - Xhtml
>    Affects Versions: 1.4
>            Reporter: Andrius Velykis
>            Assignee: Olivier Lamy
>         Attachments: doxia-core-XhtmlBaseParser.patch, doxia-core-XhtmlBaseParser.patch, doxia-xhtml-entities-bug.zip
>
>
> XHTML defines a number of default entities that can appear in valid XHTML files (http://www.w3.org/TR/xhtml1/#h-A2), such as left/right quotes: &ldquo;, &rsquo;, and many others.
> XhtmlBaseParser, however, ignores XHTML default entities appearing in the source code. This is because it delegates the parsing to AbstractXmlParser, which uses vanilla MXParser to parse. MXParser only recognises default XML entities.
> Because the HTML entities are not resolved by the XML parser, and thus by the XHTML parser, they are not rendered by the XHTML module. I have attached a sample project for Maven site that uses XHTML module. The source file has double/single quotes, however the output file does not.
> This also affects other parsers that extend XhtmlParser, e.g. MarkdownParser (see DOXIA-473 for a reported bug). This is because Pegdown library, used to parse Markdown, generates &ldquo; for quotes and other entities.
> I have attached a patch that fixes this problem. It exposes the XmlPullParser (MXParser) for configuration before parsing, so that extending classes could define default entities. Then XhtmlBaseParser adds default XHTML entities to the parser. This patch will also fix DOXIA-473, because MarkdownParser extends XhtmlParser.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://jira.codehaus.org/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] (DOXIA-480) XhtmlBaseParser ignores XHTML default entities

Posted by "Andrius Velykis (JIRA)" <ji...@codehaus.org>.

     [ https://jira.codehaus.org/browse/DOXIA-480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrius Velykis updated DOXIA-480:
----------------------------------

    Attachment: doxia-core-XhtmlBaseParser.patch

Added unit test to check that symbols from XHTML entities are parsed correctly to XhtmlBaseParserTest.

Updated the patch to include the tests.
                
> XhtmlBaseParser ignores XHTML default entities
> ----------------------------------------------
>
>                 Key: DOXIA-480
>                 URL: https://jira.codehaus.org/browse/DOXIA-480
>             Project: Maven Doxia
>          Issue Type: Bug
>          Components: Core, Module - Xhtml
>    Affects Versions: 1.4
>            Reporter: Andrius Velykis
>         Attachments: doxia-core-XhtmlBaseParser.patch, doxia-core-XhtmlBaseParser.patch, doxia-xhtml-entities-bug.zip
>
>
> XHTML defines a number of default entities that can appear in valid XHTML files (http://www.w3.org/TR/xhtml1/#h-A2), such as left/right quotes: &ldquo;, &rsquo;, and many others.
> XhtmlBaseParser, however, ignores XHTML default entities appearing in the source code. This is because it delegates the parsing to AbstractXmlParser, which uses vanilla MXParser to parse. MXParser only recognises default XML entities.
> Because the HTML entities are not resolved by the XML parser, and thus by the XHTML parser, they are not rendered by the XHTML module. I have attached a sample project for Maven site that uses XHTML module. The source file has double/single quotes, however the output file does not.
> This also affects other parsers that extend XhtmlParser, e.g. MarkdownParser (see DOXIA-473 for a reported bug). This is because Pegdown library, used to parse Markdown, generates &ldquo; for quotes and other entities.
> I have attached a patch that fixes this problem. It exposes the XmlPullParser (MXParser) for configuration before parsing, so that extending classes could define default entities. Then XhtmlBaseParser adds default XHTML entities to the parser. This patch will also fix DOXIA-473, because MarkdownParser extends XhtmlParser.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://jira.codehaus.org/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira