You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@shindig.apache.org by "Adam Winer (JIRA)" <ji...@apache.org> on 2009/04/06 19:44:14 UTC
[jira] Commented: (SHINDIG-987) NekoParser returns cryptic error
messages when parsing bad html
[ https://issues.apache.org/jira/browse/SHINDIG-987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12696191#action_12696191 ]
Adam Winer commented on SHINDIG-987:
------------------------------------
Good improvement. Comments on the patch:
- indent looks off (case not indented in from switch)
- Use StringBuilder, not StringBuffer
- catch() should probably be extracted into its own method
- in test, "catched" -> "caught"
> NekoParser returns cryptic error messages when parsing bad html
> ---------------------------------------------------------------
>
> Key: SHINDIG-987
> URL: https://issues.apache.org/jira/browse/SHINDIG-987
> Project: Shindig
> Issue Type: Bug
> Components: Java
> Affects Versions: trunk
> Reporter: Paul Lindner
> Attachments: SHINDIG-987.patch
>
>
> startImportantElement can throw exceptions when parsing malformed html:
> Given this html:
> <div id="div_super" class="div_super" valign:"middle"></div>
> You get an exception like this:
> org.w3c.dom.DOMException: INVALID_CHARACTER_ERR: An invalid or illegal XML character is specified.
> org.apache.xerces.dom.CoreDocumentImpl.createAttribute(Unknown Source)
> org.apache.xerces.dom.ElementImpl.setAttribute(Unknown Source)
> org.apache.shindig.gadgets.parse.nekohtml.NekoSimplifiedHtmlParser$DocumentHandler.startImportantElement(NekoSimplifiedHtmlParser.java:292)
> org.apache.shindig.gadgets.parse.nekohtml.NekoSimplifiedHtmlParser$DocumentHandler.startElement(NekoSimplifiedHtmlParser.java:242)
> org.apache.shindig.gadgets.parse.nekohtml.SocialMarkupHtmlParser$SocialMarkupDocumentHandler.startElement(SocialMarkupHtmlParser.java:130)
> Which is caused here:
> for (int i = 0; i < xmlAttributes.getLength(); i++) {
> if (xmlAttributes.getURI(i) != null) {
> element.setAttributeNS(xmlAttributes.getURI(i), xmlAttributes.getQName(i),
> xmlAttributes.getValue(i));
> } else {
> element.setAttribute(xmlAttributes.getLocalName(i) , xmlAttributes.getValue(i));
> }
> }
> because we're trying to set a tag with a colon in it.
> We should probably add some error checking here so that we can more easily identify the offending HTML without using a debugger.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.