You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@shindig.apache.org by "Adam Winer (JIRA)" <ji...@apache.org> on 2009/04/06 19:44:14 UTC

[jira] Commented: (SHINDIG-987) NekoParser returns cryptic error messages when parsing bad html

    [ https://issues.apache.org/jira/browse/SHINDIG-987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12696191#action_12696191 ] 

Adam Winer commented on SHINDIG-987:
------------------------------------

Good improvement.  Comments on the patch:
- indent looks off (case not indented in from switch)
- Use StringBuilder, not StringBuffer
- catch() should probably be extracted into its own method 
- in test, "catched" -> "caught"

> NekoParser returns cryptic error messages when parsing bad html
> ---------------------------------------------------------------
>
>                 Key: SHINDIG-987
>                 URL: https://issues.apache.org/jira/browse/SHINDIG-987
>             Project: Shindig
>          Issue Type: Bug
>          Components: Java
>    Affects Versions: trunk
>            Reporter: Paul Lindner
>         Attachments: SHINDIG-987.patch
>
>
> startImportantElement can throw exceptions when parsing malformed html:
> Given this html:
>     <div id="div_super" class="div_super" valign:"middle"></div>
> You get an exception like this:
> org.w3c.dom.DOMException: INVALID_CHARACTER_ERR: An invalid or illegal XML character is specified. 
> 	org.apache.xerces.dom.CoreDocumentImpl.createAttribute(Unknown Source)
> 	org.apache.xerces.dom.ElementImpl.setAttribute(Unknown Source)
> 	org.apache.shindig.gadgets.parse.nekohtml.NekoSimplifiedHtmlParser$DocumentHandler.startImportantElement(NekoSimplifiedHtmlParser.java:292)
> 	org.apache.shindig.gadgets.parse.nekohtml.NekoSimplifiedHtmlParser$DocumentHandler.startElement(NekoSimplifiedHtmlParser.java:242)
> 	org.apache.shindig.gadgets.parse.nekohtml.SocialMarkupHtmlParser$SocialMarkupDocumentHandler.startElement(SocialMarkupHtmlParser.java:130)
> Which is caused here:
>       for (int i = 0; i < xmlAttributes.getLength(); i++) {
>         if (xmlAttributes.getURI(i) != null) {
>           element.setAttributeNS(xmlAttributes.getURI(i), xmlAttributes.getQName(i),
>               xmlAttributes.getValue(i));
>         } else {
>           element.setAttribute(xmlAttributes.getLocalName(i) , xmlAttributes.getValue(i));
>         }
>       }
> because we're trying to set a tag with a colon in it.
> We should probably add some error checking here so that we can more easily identify the offending HTML without using a debugger.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.