You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@shindig.apache.org by jo...@apache.org on 2009/12/17 01:48:31 UTC

svn commit: r891496 [1/2] - in /incubator/shindig/trunk: ./ features/src/main/javascript/features/caja/ java/gadgets/ java/gadgets/src/main/java/org/apache/shindig/gadgets/parse/ java/gadgets/src/main/java/org/apache/shindig/gadgets/parse/caja/ java/ga...

Author: johnh
Date: Thu Dec 17 00:48:29 2009
New Revision: 891496

URL: http://svn.apache.org/viewvc?rev=891496&view=rev
Log:
Introduces Caja-based GadgetHtmlParser, and refactors/cleans up HtmlParser impl a bit.

Summary:
* Overhauls the GadgetHtmlParser base class and associated test cases
* Tweaks the Neko-based HTML parser implementation
* Introduces new Caja-based HTML parser

This fairly substantial CL reworks the HTML parsing system to better represent
(though not fully yet) the way that HTML is handled within gadgets: as tag soup,
cleaned up via custom rules after the fact into a legitimate, well-formed
document. It's a step toward treating concrete GadgetHtmlParser implementations
purely as fragment parsers.

Change detail:
* All parsing tests factored into base test classes with concrete tests largely
just providing a concrete parser implementation.
  - HTML-equivalence method added utilizing the (fantastic) diff_match_patch
library, which ignores whitespace, case, and attributing-encoding differences.
* GadgetHtmlParser now does significant cleanup of the DOM it retrieves from
parseDomImpl(...), which BTW will soon go the way of the dodo in favor of always
using parseFragmentImpl(...)
  - Creates head element and populates it with all style elements (only), as
putting these here cannot break rendering and because HTML requires <style> in
head.
  - Creates body element as well.
  - Combines multiple <head> elements together, if present.
  - Prepends head with elements that occurred above a <head> element that
occurred in source, if any.
  - Combines multiple <body> elements together, if present.
  - Prepends and appends, respectively, elements found before and after the
first <body> tag and after the first <head> tag, and elements found after the
first <body> tag, without any <head> or <body> parent, to the <body> tag (that
was a mouthful).
  - As noted above, stuffs all <style> elements found in <body> at the end of
<head>
  - If OpenSocial-type <script> elements are treated per spec (ie. having only
text, no children), reprocesses this text as HTML and adds as children for
template processing.
* Introduces CajaHtmlParser
  - Still has parseDomImpl method, mostly for API compatibility (short-term)
with Neko-based HtmlParser implementation, which has subtle differences btw
parseDomImpl and parseFragmentImpl which I want to clean up in a follow-up CL
(again, obviating the need for parseDomImpl altogether).
  - Delegates to Caja's DomParser class's parseFragment() method for most
parsing needs


Added:
    incubator/shindig/trunk/java/gadgets/src/main/java/org/apache/shindig/gadgets/parse/caja/CajaHtmlParser.java
    incubator/shindig/trunk/java/gadgets/src/test/java/org/apache/shindig/gadgets/parse/AbstractParsingTestBase.java
    incubator/shindig/trunk/java/gadgets/src/test/java/org/apache/shindig/gadgets/parse/AbstractSocialMarkupHtmlParserTest.java
    incubator/shindig/trunk/java/gadgets/src/test/java/org/apache/shindig/gadgets/parse/caja/CajaCompactHtmlSerializerTest.java
    incubator/shindig/trunk/java/gadgets/src/test/java/org/apache/shindig/gadgets/parse/caja/CajaParserAndSerializerTest.java
    incubator/shindig/trunk/java/gadgets/src/test/java/org/apache/shindig/gadgets/parse/caja/CajaSocialMarkupHtmlParserTest.java
    incubator/shindig/trunk/java/gadgets/src/test/java/org/apache/shindig/gadgets/parse/nekohtml/NekoCompactHtmlSerializerTest.java
    incubator/shindig/trunk/java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/nekohtml/test-fulldocnodoctype-expected.html
    incubator/shindig/trunk/java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/test-expected.html
    incubator/shindig/trunk/java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/test-fragment-expected.html
    incubator/shindig/trunk/java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/test-fragment.html
    incubator/shindig/trunk/java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/test-fragment2-expected.html
    incubator/shindig/trunk/java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/test-fragment2.html
    incubator/shindig/trunk/java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/test-fulldocnodoctype-expected.html
    incubator/shindig/trunk/java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/test-fulldocnodoctype.html
    incubator/shindig/trunk/java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/test-headnobody-expected.html
    incubator/shindig/trunk/java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/test-headnobody.html
    incubator/shindig/trunk/java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/test-socialmarkup.html
    incubator/shindig/trunk/java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/test-with-ampersands-expected.html
    incubator/shindig/trunk/java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/test-with-ampersands.html
    incubator/shindig/trunk/java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/test-with-iecond-comments-expected.html
    incubator/shindig/trunk/java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/test-with-iecond-comments.html
    incubator/shindig/trunk/java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/test-with-specialtags-expected.html
    incubator/shindig/trunk/java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/test-with-specialtags.html
    incubator/shindig/trunk/java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/test.html
Removed:
    incubator/shindig/trunk/java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/nekohtml/test-expected.html
    incubator/shindig/trunk/java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/nekohtml/test-fragment2-expected.html
    incubator/shindig/trunk/java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/nekohtml/test-fragment2.html
    incubator/shindig/trunk/java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/nekohtml/test-socialmarkup.html
    incubator/shindig/trunk/java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/nekohtml/test-with-ampersands-expected.html
    incubator/shindig/trunk/java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/nekohtml/test-with-ampersands.html
    incubator/shindig/trunk/java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/nekohtml/test-with-iecond-comments-expected.html
    incubator/shindig/trunk/java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/nekohtml/test-with-iecond-comments.html
    incubator/shindig/trunk/java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/nekohtml/test-with-specialtags-expected.html
    incubator/shindig/trunk/java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/nekohtml/test-with-specialtags.html
    incubator/shindig/trunk/java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/nekohtml/test.html
Modified:
    incubator/shindig/trunk/features/src/main/javascript/features/caja/feature.xml
    incubator/shindig/trunk/java/gadgets/pom.xml
    incubator/shindig/trunk/java/gadgets/src/main/java/org/apache/shindig/gadgets/parse/GadgetHtmlParser.java
    incubator/shindig/trunk/java/gadgets/src/main/java/org/apache/shindig/gadgets/parse/nekohtml/NekoSimplifiedHtmlParser.java
    incubator/shindig/trunk/java/gadgets/src/main/java/org/apache/shindig/gadgets/servlet/CajaContentRewriter.java
    incubator/shindig/trunk/java/gadgets/src/test/java/org/apache/shindig/gadgets/parse/AbstractParserAndSerializerTest.java
    incubator/shindig/trunk/java/gadgets/src/test/java/org/apache/shindig/gadgets/parse/CompactHtmlSerializerTest.java
    incubator/shindig/trunk/java/gadgets/src/test/java/org/apache/shindig/gadgets/parse/nekohtml/NekoParserAndSerializeTest.java
    incubator/shindig/trunk/java/gadgets/src/test/java/org/apache/shindig/gadgets/parse/nekohtml/SocialMarkupHtmlParserTest.java
    incubator/shindig/trunk/java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/nekohtml/test-leadingscript-expected.html
    incubator/shindig/trunk/pom.xml

Modified: incubator/shindig/trunk/features/src/main/javascript/features/caja/feature.xml
URL: http://svn.apache.org/viewvc/incubator/shindig/trunk/features/src/main/javascript/features/caja/feature.xml?rev=891496&r1=891495&r2=891496&view=diff
==============================================================================
--- incubator/shindig/trunk/features/src/main/javascript/features/caja/feature.xml (original)
+++ incubator/shindig/trunk/features/src/main/javascript/features/caja/feature.xml Thu Dec 17 00:48:29 2009
@@ -23,7 +23,7 @@
   <gadget>
     <script src="res://com/google/caja/plugin/domita-minified.js"/>
     <script src="caja.js"/>
-    <script src="res://com/google/caja/plugin/valija.co.js"/>
+    <script src="res://com/google/caja/plugin/valija.out.js"/>
     <script src="taming.js"/>
   </gadget>
 </feature>

Modified: incubator/shindig/trunk/java/gadgets/pom.xml
URL: http://svn.apache.org/viewvc/incubator/shindig/trunk/java/gadgets/pom.xml?rev=891496&r1=891495&r2=891496&view=diff
==============================================================================
--- incubator/shindig/trunk/java/gadgets/pom.xml (original)
+++ incubator/shindig/trunk/java/gadgets/pom.xml Thu Dec 17 00:48:29 2009
@@ -129,6 +129,11 @@
       <artifactId>json</artifactId>
     </dependency>
     <dependency>
+      <groupId>diff_match_patch</groupId>
+      <artifactId>diff_match_patch</artifactId>
+      <scope>test</scope>
+    </dependency>
+    <dependency>
       <groupId>caja</groupId>
       <artifactId>caja</artifactId>
     </dependency>

Modified: incubator/shindig/trunk/java/gadgets/src/main/java/org/apache/shindig/gadgets/parse/GadgetHtmlParser.java
URL: http://svn.apache.org/viewvc/incubator/shindig/trunk/java/gadgets/src/main/java/org/apache/shindig/gadgets/parse/GadgetHtmlParser.java?rev=891496&r1=891495&r2=891496&view=diff
==============================================================================
--- incubator/shindig/trunk/java/gadgets/src/main/java/org/apache/shindig/gadgets/parse/GadgetHtmlParser.java (original)
+++ incubator/shindig/trunk/java/gadgets/src/main/java/org/apache/shindig/gadgets/parse/GadgetHtmlParser.java Thu Dec 17 00:48:29 2009
@@ -20,20 +20,23 @@
 import org.apache.shindig.common.cache.Cache;
 import org.apache.shindig.common.cache.CacheProvider;
 import org.apache.shindig.common.util.HashUtil;
-import org.apache.shindig.common.xml.DomUtil;
 import org.apache.shindig.gadgets.GadgetException;
 import org.apache.shindig.gadgets.parse.nekohtml.NekoSimplifiedHtmlParser;
 
 import com.google.common.collect.BiMap;
 import com.google.common.collect.ImmutableBiMap;
+import com.google.common.collect.Lists;
 import com.google.inject.ImplementedBy;
 import com.google.inject.Inject;
 import com.google.inject.Provider;
+import org.w3c.dom.Attr;
 import org.w3c.dom.Document;
 import org.w3c.dom.DocumentFragment;
 import org.w3c.dom.Node;
 import org.w3c.dom.NodeList;
 
+import java.util.LinkedList;
+
 /**
  * Parser for arbitrary HTML content
  */
@@ -89,28 +92,103 @@
       key = HashUtil.rawChecksum(source.getBytes());
       document = documentCache.getElement(key);
     }
+    
     if (document == null) {
       document = parseDomImpl(source);
 
       HtmlSerialization.attach(document, serializerProvider.get(), source);
 
+      Node html = document.getDocumentElement();
+      
+      Node head = null;
+      Node body = null;
+      LinkedList<Node> beforeHead = Lists.newLinkedList();
+      LinkedList<Node> beforeBody = Lists.newLinkedList();
+      
+      while (html.hasChildNodes()) {
+        Node child = html.removeChild(html.getFirstChild());
+        if (child.getNodeType() == Node.ELEMENT_NODE &&
+            "head".equalsIgnoreCase(child.getNodeName())) {
+          if (head == null) {
+            head = child;
+          } else {
+            // Concatenate <head> elements together.
+            transferChildren(head, child);
+          }
+        } else if (child.getNodeType() == Node.ELEMENT_NODE &&
+                   "body".equalsIgnoreCase(child.getNodeName())) {
+          if (body == null) {
+            body = child;
+          } else {
+            // Concatenate <body> elements together.
+            transferChildren(body, child);
+          }
+        } else if (head == null) {
+          beforeHead.add(child);
+        } else if (body == null) {
+          beforeBody.add(child);
+        } else {
+          // Both <head> and <body> are present. Append to tail of <body>.
+          body.appendChild(child);
+        }
+      }
+      
       // Ensure head tag exists
-      if (DomUtil.getFirstNamedChildNode(document.getDocumentElement(), "head") == null) {
+      if (head == null) {
+        // beforeHead contains all elements that should be prepended to <body>. Switch them.
+        LinkedList<Node> temp = beforeBody;
+        beforeBody = beforeHead;
+        beforeHead = temp;
+        
         // Add as first element
-        document.getDocumentElement().insertBefore(
-            document.createElement("head"),
-            document.getDocumentElement().getFirstChild());
-      }
-      // If body not found the document was entirely empty. Create the
-      // element anyway
-      if (DomUtil.getFirstNamedChildNode(document.getDocumentElement(), "body") == null) {
-        document.getDocumentElement().appendChild(
-            document.createElement("body"));
+        head = document.createElement("head");
+        html.insertBefore(head, html.getFirstChild());
+      } else {
+        // Re-append head node.
+        html.appendChild(head);
+      }
+      
+      // Ensure body tag exists.
+      if (body == null) {
+        // Add immediately after head.
+        body = document.createElement("body");
+        html.insertBefore(body, head.getNextSibling());
+      } else {
+        // Re-append body node.
+        html.appendChild(body);
+      }
+      
+      // Leftovers: nodes before the first <head> node found and the first <body> node found.
+      // Prepend beforeHead to the front of <head>, and beforeBody to beginning of <body>,
+      // in the order they were found in the document.
+      prependToNode(head, beforeHead);
+      prependToNode(body, beforeBody);
+      
+      // One exception. <style> nodes from <body> end up at the end of <head>, since doing so
+      // is HTML compliant and can never break rendering due to ordering concerns.
+      LinkedList<Node> styleNodes = Lists.newLinkedList();
+      NodeList bodyKids = body.getChildNodes();
+      for (int i = 0; i < bodyKids.getLength(); ++i) {
+        Node bodyKid = bodyKids.item(i);
+        if (bodyKid.getNodeType() == Node.ELEMENT_NODE &&
+            "style".equalsIgnoreCase(bodyKid.getNodeName())) {
+          styleNodes.add(bodyKid);
+        }
       }
+      
+      for (Node styleNode : styleNodes) {
+        head.appendChild(body.removeChild(styleNode));
+      }
+      
+      // Finally, reprocess all script nodes for OpenSocial purposes, as these
+      // may be interpreted (rightly, from the perspective of HTML) as containing text only.
+      reprocessScriptForOpenSocial(html);
+      
       if (shouldCache) {
         documentCache.addElement(key, document);
       }
     }
+    
     if (shouldCache) {
       Document copy = (Document)document.cloneNode(true);
       HtmlSerialization.copySerializer(document, copy);
@@ -118,6 +196,18 @@
     }
     return document;
   }
+  
+  protected void transferChildren(Node to, Node from) {
+    while (from.hasChildNodes()) {
+      to.appendChild(from.removeChild(from.getFirstChild()));
+    }
+  }
+  
+  protected void prependToNode(Node to, LinkedList<Node> from) {
+    while (from.size() > 0) {
+      to.insertBefore(from.removeLast(), to.getFirstChild());
+    }
+  }
 
   /**
    * Parses a snippet of markup and appends the result as children to the 
@@ -139,6 +229,7 @@
       }
     }
     DocumentFragment fragment = parseFragmentImpl(source);
+    reprocessScriptForOpenSocial(fragment);
     if (shouldCache) {
       fragmentCache.addElement(key, fragment);
     }
@@ -157,13 +248,63 @@
   protected boolean shouldCache() {
     return documentCache != null && documentCache.getCapacity() != 0;
   }
-
+  
+  private void reprocessScriptForOpenSocial(Node root) throws GadgetException {
+    LinkedList<Node> nodeQueue = Lists.newLinkedList();
+    nodeQueue.add(root);
+    while (!nodeQueue.isEmpty()) {
+      Node next = nodeQueue.removeFirst();
+      if (next.getNodeType() == Node.ELEMENT_NODE &&
+          "script".equalsIgnoreCase(next.getNodeName())) {
+        Attr typeAttr = (Attr)next.getAttributes().getNamedItem("type");
+        if (typeAttr != null && SCRIPT_TYPE_TO_OSML_TAG.get(typeAttr.getValue()) != null) {
+          // The underlying parser impl may have already parsed these.
+          // Only re-parse with the coalesced text children if that's all there are.
+          boolean parseOs = true;
+          StringBuilder sb = new StringBuilder();
+          NodeList scriptKids = next.getChildNodes();
+          for (int i = 0; parseOs && i < scriptKids.getLength(); ++i) {
+            Node scriptKid = scriptKids.item(i);
+            if (scriptKid.getNodeType() != Node.TEXT_NODE) {
+              parseOs = false;
+            }
+            sb.append(scriptKid.getTextContent());
+          }
+          if (parseOs) {
+            // Clean out the script node.
+            while (next.hasChildNodes()) {
+              next.removeChild(next.getFirstChild());
+            }
+            DocumentFragment osFragment = parseFragmentImpl(sb.toString());
+            while (osFragment.hasChildNodes()) {
+              Node osKid = osFragment.removeChild(osFragment.getFirstChild());
+              osKid = next.getOwnerDocument().adoptNode(osKid);
+              if (osKid.getNodeType() == Node.ELEMENT_NODE) {
+                next.appendChild(osKid);
+              }
+            }
+          }
+        }
+      }
+      
+      // Enqueue children for inspection.
+      NodeList children = next.getChildNodes();
+      for (int i = 0; i < children.getLength(); ++i) {
+        nodeQueue.add(children.item(i));
+      }
+    }
+  }
+  
   /**
-   * @param source
-   * @return a parsed document or document fragment
+   * TODO: remove the need for parseDomImpl as a parsing method. Gadget HTML is
+   * tag soup handled in custom fashion, or is a legitimate fragment. In either case,
+   * we can simply use the fragment parsing implementation and patch up in higher-level calls.
+   * @param source a piece of HTML
+   * @return a Document parsed from the HTML
    * @throws GadgetException
    */
-  protected abstract Document parseDomImpl(String source) throws GadgetException;
+  protected abstract Document parseDomImpl(String source)
+      throws GadgetException;
 
   /**
    * @param source a snippet of HTML markup
@@ -172,39 +313,6 @@
    */
   protected abstract DocumentFragment parseFragmentImpl(String source) 
       throws GadgetException;
-  
-  /**
-   * Normalize head and body tags in the passed fragment before including it
-   * in the document
-   * @param document
-   * @param fragment
-   */
-  protected void normalizeFragment(Document document, DocumentFragment fragment) {
-    Node htmlNode = DomUtil.getFirstNamedChildNode(fragment, "HTML");
-    if (htmlNode != null) {
-      document.appendChild(htmlNode);
-    } else {
-      Node bodyNode = DomUtil.getFirstNamedChildNode(fragment, "body");
-      Node headNode = DomUtil.getFirstNamedChildNode(fragment, "head");
-      if (bodyNode != null || headNode != null) {
-        // We have either a head or body so put fragment into HTML tag
-        Node root = document.appendChild(document.createElement("html"));
-        if (headNode != null && bodyNode == null) {
-          fragment.removeChild(headNode);
-          root.appendChild(headNode);
-          Node body = root.appendChild(document.createElement("body"));
-          body.appendChild(fragment);
-        } else {
-          root.appendChild(fragment);
-        }
-      } else {
-        // No head or body so put fragment into a body
-        Node root = document.appendChild(document.createElement("html"));
-        Node body = root.appendChild(document.createElement("body"));
-        body.appendChild(fragment);
-      }
-    }
-  }
 
   private static class DefaultSerializerProvider implements Provider<HtmlSerializer> {
     public HtmlSerializer get() {

Added: incubator/shindig/trunk/java/gadgets/src/main/java/org/apache/shindig/gadgets/parse/caja/CajaHtmlParser.java
URL: http://svn.apache.org/viewvc/incubator/shindig/trunk/java/gadgets/src/main/java/org/apache/shindig/gadgets/parse/caja/CajaHtmlParser.java?rev=891496&view=auto
==============================================================================
--- incubator/shindig/trunk/java/gadgets/src/main/java/org/apache/shindig/gadgets/parse/caja/CajaHtmlParser.java (added)
+++ incubator/shindig/trunk/java/gadgets/src/main/java/org/apache/shindig/gadgets/parse/caja/CajaHtmlParser.java Thu Dec 17 00:48:29 2009
@@ -0,0 +1,136 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations under the License.
+ */
+package org.apache.shindig.gadgets.parse.caja;
+
+import java.util.LinkedList;
+
+import com.google.caja.lexer.CharProducer;
+import com.google.caja.lexer.HtmlLexer;
+import com.google.caja.lexer.InputSource;
+import com.google.caja.lexer.ParseException;
+import com.google.caja.parser.html.DomParser;
+import com.google.caja.parser.html.Namespaces;
+import com.google.caja.reporting.Message;
+import com.google.caja.reporting.MessageLevel;
+import com.google.caja.reporting.MessageQueue;
+import com.google.caja.reporting.SimpleMessageQueue;
+import com.google.common.collect.Lists;
+import com.google.inject.Inject;
+
+import org.apache.shindig.gadgets.GadgetException;
+import org.apache.shindig.gadgets.parse.GadgetHtmlParser;
+import org.w3c.dom.DOMImplementation;
+import org.w3c.dom.Document;
+import org.w3c.dom.DocumentFragment;
+import org.w3c.dom.Node;
+
+public class CajaHtmlParser extends GadgetHtmlParser {
+  private final DOMImplementation documentFactory;
+
+  @Inject
+  public CajaHtmlParser(DOMImplementation documentFactory) {
+    this.documentFactory = documentFactory;
+  }
+  
+  @Override
+  protected Document parseDomImpl(String source) throws GadgetException {
+    DocumentFragment fragment = parseFragmentImpl(source);
+    
+    // TODO: remove parseDomImpl() altogether; only have subclasses
+    // support parseFragmentImpl() with base class cleaning up.
+    Document document = fragment.getOwnerDocument();
+    Node html = null;
+    LinkedList<Node> beforeHtml = Lists.newLinkedList();
+    while (fragment.hasChildNodes()) {
+      Node child = fragment.removeChild(fragment.getFirstChild());
+      if (child.getNodeType() == Node.ELEMENT_NODE &&
+          "html".equalsIgnoreCase(child.getNodeName())) {
+        if (html == null) {
+          html = child;
+        } else {
+          // Ignore the current (duplicated) html node but add its children
+          transferChildren(html, child);
+        }
+      } else if (html != null) {
+        html.appendChild(child);
+      } else {
+        beforeHtml.add(child);
+      }
+    }
+    
+    if (html == null) {
+      html = document.createElement("html");
+    }
+    
+    prependToNode(html, beforeHtml);
+    
+    // Ensure document.getDocumentElement() is html node.
+    document.appendChild(html);
+    
+    return document;
+  }
+
+  @Override
+  protected DocumentFragment parseFragmentImpl(String source)
+      throws GadgetException {
+    try {
+      MessageQueue mq = makeMessageQueue();
+      DomParser parser = getDomParser(source, mq);
+      DocumentFragment fragment = parser.parseFragment();
+      if (mq.hasMessageAtLevel(MessageLevel.ERROR)) {
+        StringBuilder err = new StringBuilder();
+        for (Message m : mq.getMessages()) {
+          err.append(m.toString()).append("\n");
+        }
+        throw new GadgetException(GadgetException.Code.HTML_PARSE_ERROR, err.toString());
+      }
+      return fragment;
+    } catch (ParseException e) {
+      throw new GadgetException(
+          GadgetException.Code.HTML_PARSE_ERROR, e.getCajaMessage().toString());
+    }
+  }
+
+  protected InputSource getInputSource() {
+    // Returns a default/dummy InputSource.
+    // We might consider adding the gadget URI to the GadgetHtmlParser API,
+    // but in the meantime this method is protected to allow overriding this
+    // with request-scoped retrieval of this same data.
+    return InputSource.UNKNOWN;
+  }
+  
+  protected MessageQueue makeMessageQueue() {
+    return new SimpleMessageQueue();
+  }
+  
+  protected boolean needsDebugData() {
+    return false;
+  }
+  
+  private DomParser getDomParser(String source, final MessageQueue mq) throws ParseException {
+    InputSource is = getInputSource();
+    HtmlLexer lexer = new HtmlLexer(CharProducer.Factory.fromString(source, is));
+    final Namespaces ns = Namespaces.HTML_DEFAULT;  // Includes OpenSocial
+    final boolean needsDebugData = needsDebugData();
+    DomParser parser = new DomParser(lexer, is, ns, mq);
+    parser.setDomImpl(documentFactory);
+    parser.setWantsComments(true);
+    parser.setNeedsDebugData(needsDebugData);
+    return parser;
+  }
+}

Modified: incubator/shindig/trunk/java/gadgets/src/main/java/org/apache/shindig/gadgets/parse/nekohtml/NekoSimplifiedHtmlParser.java
URL: http://svn.apache.org/viewvc/incubator/shindig/trunk/java/gadgets/src/main/java/org/apache/shindig/gadgets/parse/nekohtml/NekoSimplifiedHtmlParser.java?rev=891496&r1=891495&r2=891496&view=diff
==============================================================================
--- incubator/shindig/trunk/java/gadgets/src/main/java/org/apache/shindig/gadgets/parse/nekohtml/NekoSimplifiedHtmlParser.java (original)
+++ incubator/shindig/trunk/java/gadgets/src/main/java/org/apache/shindig/gadgets/parse/nekohtml/NekoSimplifiedHtmlParser.java Thu Dec 17 00:48:29 2009
@@ -107,8 +107,7 @@
     }
 
     Document document = handler.getDocument();
-    DocumentFragment fragment = handler.getFragment();
-    normalizeFragment(document, fragment);
+    document.appendChild(handler.getFragment().getFirstChild());
     fixNekoWeirdness(document);
     return document;
   }

Modified: incubator/shindig/trunk/java/gadgets/src/main/java/org/apache/shindig/gadgets/servlet/CajaContentRewriter.java
URL: http://svn.apache.org/viewvc/incubator/shindig/trunk/java/gadgets/src/main/java/org/apache/shindig/gadgets/servlet/CajaContentRewriter.java?rev=891496&r1=891495&r2=891496&view=diff
==============================================================================
--- incubator/shindig/trunk/java/gadgets/src/main/java/org/apache/shindig/gadgets/servlet/CajaContentRewriter.java (original)
+++ incubator/shindig/trunk/java/gadgets/src/main/java/org/apache/shindig/gadgets/servlet/CajaContentRewriter.java Thu Dec 17 00:48:29 2009
@@ -128,7 +128,6 @@
       MessageQueue mq = new SimpleMessageQueue();
       BuildInfo bi = BuildInfo.getInstance();
       DefaultGadgetRewriter rw = new DefaultGadgetRewriter(bi, mq);
-      rw.setValijaMode(true);
       InputSource is = new InputSource(retrievedUri);
       boolean safe = false;
       

Modified: incubator/shindig/trunk/java/gadgets/src/test/java/org/apache/shindig/gadgets/parse/AbstractParserAndSerializerTest.java
URL: http://svn.apache.org/viewvc/incubator/shindig/trunk/java/gadgets/src/test/java/org/apache/shindig/gadgets/parse/AbstractParserAndSerializerTest.java?rev=891496&r1=891495&r2=891496&view=diff
==============================================================================
--- incubator/shindig/trunk/java/gadgets/src/test/java/org/apache/shindig/gadgets/parse/AbstractParserAndSerializerTest.java (original)
+++ incubator/shindig/trunk/java/gadgets/src/test/java/org/apache/shindig/gadgets/parse/AbstractParserAndSerializerTest.java Thu Dec 17 00:48:29 2009
@@ -16,34 +16,74 @@
  * specific language governing permissions and limitations
  * under the License.
  */
-
 package org.apache.shindig.gadgets.parse;
 
-import org.apache.commons.io.IOUtils;
-import org.apache.commons.lang.StringUtils;
-
-import org.junit.Assert;
-import org.w3c.dom.Document;
+import static org.junit.Assert.assertNull;
 
-import java.io.IOException;
+import org.junit.Before;
+import org.junit.Test;
 
 /**
  * Base test fixture for HTML parsing and serialization.
  */
-public abstract class AbstractParserAndSerializerTest extends Assert {
+public abstract class AbstractParserAndSerializerTest extends AbstractParsingTestBase {
+  protected GadgetHtmlParser parser;
+  
+  protected abstract GadgetHtmlParser makeParser();
+  
+  @Before
+  public void setUp() throws Exception {
+    parser = makeParser();
+  }
+
+  @Test
+  public void docWithDoctype() throws Exception {
+    // Note that doctype is properly retained
+    String content = loadFile("org/apache/shindig/gadgets/parse/test.html");
+    String expected = loadFile("org/apache/shindig/gadgets/parse/test-expected.html");
+    parseAndCompareBalanced(content, expected, parser);
+  }
 
-  /** The vm line separator */
-  private static final String EOL = System.getProperty("line.separator");
+  @Test
+  public void docNoDoctype() throws Exception {
+    // Note that no doctype is properly created when none specified
+    String content = loadFile("org/apache/shindig/gadgets/parse/test-fulldocnodoctype.html");
+    String expected =
+        loadFile("org/apache/shindig/gadgets/parse/test-fulldocnodoctype-expected.html");
+    assertNull(parser.parseDom(content).getDoctype());
+    parseAndCompareBalanced(content, expected, parser);
+  }
+
+  @Test
+  public void notADocument() throws Exception {
+    // Note that no doctype is injected for fragments
+    String content = loadFile("org/apache/shindig/gadgets/parse/test-fragment.html");
+    String expected = loadFile("org/apache/shindig/gadgets/parse/test-fragment-expected.html");
+    parseAndCompareBalanced(content, expected, parser);
+  }
+
+  @Test
+  public void notADocument2() throws Exception {
+    // Note that no doctype is injected for fragments
+    String content = loadFile("org/apache/shindig/gadgets/parse/test-fragment2.html");
+    String expected = loadFile("org/apache/shindig/gadgets/parse/test-fragment2-expected.html");
+    parseAndCompareBalanced(content, expected, parser);
+  }
 
-  protected String loadFile(String path) throws IOException {
-    return IOUtils.toString(this.getClass().getClassLoader().
-        getResourceAsStream(path));
+  @Test
+  public void noBody() throws Exception {
+    // Note that no doctype is injected for fragments
+    String content = loadFile("org/apache/shindig/gadgets/parse/test-headnobody.html");
+    String expected = loadFile("org/apache/shindig/gadgets/parse/test-headnobody-expected.html");
+    parseAndCompareBalanced(content, expected, parser);
   }
 
-  protected void parseAndCompareBalanced(String content, String expected, GadgetHtmlParser parser)
-      throws Exception {
-    Document document = parser.parseDom(content);
-    expected = StringUtils.replace(expected, EOL, "\n");
-    assertEquals(expected.trim(), HtmlSerialization.serialize(document).trim());
+  @Test
+  public void ampersand() throws Exception {
+    // Note that no doctype is injected for fragments
+    String content = loadFile("org/apache/shindig/gadgets/parse/test-with-ampersands.html");
+    String expected =
+        loadFile("org/apache/shindig/gadgets/parse/test-with-ampersands-expected.html");
+    parseAndCompareBalanced(content, expected, parser);
   }
 }

Added: incubator/shindig/trunk/java/gadgets/src/test/java/org/apache/shindig/gadgets/parse/AbstractParsingTestBase.java
URL: http://svn.apache.org/viewvc/incubator/shindig/trunk/java/gadgets/src/test/java/org/apache/shindig/gadgets/parse/AbstractParsingTestBase.java?rev=891496&view=auto
==============================================================================
--- incubator/shindig/trunk/java/gadgets/src/test/java/org/apache/shindig/gadgets/parse/AbstractParsingTestBase.java (added)
+++ incubator/shindig/trunk/java/gadgets/src/test/java/org/apache/shindig/gadgets/parse/AbstractParsingTestBase.java Thu Dec 17 00:48:29 2009
@@ -0,0 +1,114 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.shindig.gadgets.parse;
+
+import static org.junit.Assert.assertEquals;
+
+import name.fraser.neil.plaintext.diff_match_patch;
+import name.fraser.neil.plaintext.diff_match_patch.Diff;
+import name.fraser.neil.plaintext.diff_match_patch.Operation;
+
+import org.apache.commons.io.IOUtils;
+import org.apache.commons.lang.StringEscapeUtils;
+import org.apache.commons.lang.StringUtils;
+import org.w3c.dom.Document;
+
+import java.io.File;
+import java.io.FileInputStream;
+import java.io.IOException;
+import java.io.InputStream;
+import java.util.LinkedList;
+
+/**
+ * Simple base class providing test helpers for parsing/serializing tests.
+ */
+public abstract class AbstractParsingTestBase {
+  /** The vm line separator */
+  private static final String EOL = System.getProperty("line.separator");
+
+  protected String loadFile(String path) throws IOException {
+    InputStream is = this.getClass().getClassLoader().getResourceAsStream(path);
+    // ENABLE THIS if you have troubles in your IDE loading resources.
+    /*
+    if (is == null) {
+      is = new FileInputStream(new File("/shindig/base/java/gadgets/src/test/resources/" + path));
+    }
+    */
+    return IOUtils.toString(is);
+  }
+
+  protected void parseAndCompareBalanced(String content, String expected, GadgetHtmlParser parser)
+      throws Exception {
+    Document document = parser.parseDom(content);
+    expected = StringUtils.replace(expected, EOL, "\n");
+    String serialized = HtmlSerialization.serialize(document);
+    assertHtmlEquals(expected, serialized);
+  }
+  
+  private void assertHtmlEquals(String expected, String serialized) {
+    // Compute the diff of expected vs. serialized, and disregard constructs that we don't
+    // care about, such as whitespace deltas and differently-computed escape sequences.
+    diff_match_patch dmp = new diff_match_patch();
+    LinkedList<Diff> diffs = dmp.diff_main(expected, serialized);
+    while (diffs.size() > 0) {
+      Diff cur = diffs.removeFirst();
+      switch (cur.operation) {
+      case DELETE:
+        if (StringUtils.isBlank(cur.text) || "amp;".equalsIgnoreCase(cur.text)) {
+          continue;
+        }
+        if (diffs.size() == 0) {
+          // End of the set: assert known failure.
+          assertEquals(expected, serialized);
+        }
+        Diff next = diffs.removeFirst();
+        if (next.operation != Operation.INSERT) {
+          // Next operation isn't a paired insert: assert known failure.
+          assertEquals(expected, serialized);
+        }
+        if (!equivalentEntities(cur.text, next.text) &&
+            !cur.text.equalsIgnoreCase(next.text)) {
+          // Delete/insert pair: fail unless each's text is equivalent
+          // either in terms of case or entity equivalence.
+          assertEquals(expected, serialized);
+        }
+        break;
+      case INSERT:
+        // Assert known failure unless insert is whitespace/blank.
+        if (StringUtils.isBlank(cur.text) || "amp;".equalsIgnoreCase(cur.text)) {
+          continue;
+        }
+        assertEquals(expected, serialized);
+        break;
+      default:
+        // EQUALS: move on.
+        break;
+      }
+    }
+  }
+  
+  private boolean equivalentEntities(String prev, String cur) {
+    if (!prev.endsWith(";") && !cur.endsWith(";")) {
+      return false;
+    }
+    String prevEnt = StringEscapeUtils.unescapeHtml(prev);
+    String curEnt = StringEscapeUtils.unescapeHtml(cur);
+    return prevEnt.equals(curEnt);
+  }
+}

Added: incubator/shindig/trunk/java/gadgets/src/test/java/org/apache/shindig/gadgets/parse/AbstractSocialMarkupHtmlParserTest.java
URL: http://svn.apache.org/viewvc/incubator/shindig/trunk/java/gadgets/src/test/java/org/apache/shindig/gadgets/parse/AbstractSocialMarkupHtmlParserTest.java?rev=891496&view=auto
==============================================================================
--- incubator/shindig/trunk/java/gadgets/src/test/java/org/apache/shindig/gadgets/parse/AbstractSocialMarkupHtmlParserTest.java (added)
+++ incubator/shindig/trunk/java/gadgets/src/test/java/org/apache/shindig/gadgets/parse/AbstractSocialMarkupHtmlParserTest.java Thu Dec 17 00:48:29 2009
@@ -0,0 +1,198 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.shindig.gadgets.parse;
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertTrue;
+import static org.junit.Assert.fail;
+
+import com.google.common.collect.ImmutableSet;
+import com.google.common.collect.Lists;
+
+import org.apache.commons.lang.StringUtils;
+import org.apache.shindig.common.xml.DomUtil;
+import org.apache.shindig.gadgets.GadgetException;
+import org.apache.shindig.gadgets.parse.GadgetHtmlParser;
+import org.apache.shindig.gadgets.parse.HtmlSerialization;
+import org.apache.shindig.gadgets.spec.PipelinedData;
+
+import org.junit.Before;
+import org.junit.Test;
+import org.w3c.dom.Attr;
+import org.w3c.dom.DOMException;
+import org.w3c.dom.Document;
+import org.w3c.dom.Element;
+import org.w3c.dom.Node;
+import org.w3c.dom.NodeList;
+
+import java.util.Iterator;
+import java.util.List;
+
+/**
+ * Test for the social markup parser.
+ */
+public abstract class AbstractSocialMarkupHtmlParserTest extends AbstractParsingTestBase {
+  private GadgetHtmlParser parser;
+  private Document document;
+
+  protected abstract GadgetHtmlParser makeParser();
+  
+  @Before
+  public void setUp() throws Exception {
+    parser = makeParser();
+
+    String content = loadFile("org/apache/shindig/gadgets/parse/test-socialmarkup.html");
+    document = parser.parseDom(content);
+  }
+
+  @Test
+  public void testSocialData() {
+    // Verify elements are preserved in social data
+    List<Element> scripts = getTags(GadgetHtmlParser.OSML_DATA_TAG);
+    assertEquals(1, scripts.size());
+    
+    NodeList viewerRequests = scripts.get(0).getElementsByTagNameNS(
+        PipelinedData.OPENSOCIAL_NAMESPACE, "ViewerRequest");
+    assertEquals(1, viewerRequests.getLength());
+    Element viewerRequest = (Element) viewerRequests.item(0);
+    assertEquals("viewer", viewerRequest.getAttribute("key"));
+    assertEmpty(viewerRequest);
+  }
+
+  @Test
+  public void testSocialTemplate() {
+    // Verify elements and text content are preserved in social templates
+    List<Element> scripts = getTags(GadgetHtmlParser.OSML_TEMPLATE_TAG);
+    assertEquals(1, scripts.size());
+    
+    assertEquals("template-id", scripts.get(0).getAttribute("id"));
+    assertEquals("template-name", scripts.get(0).getAttribute("name"));
+    assertEquals("template-tag", scripts.get(0).getAttribute("tag"));
+    
+    NodeList boldElements = scripts.get(0).getElementsByTagName("b");
+    assertEquals(1, boldElements.getLength());
+    Element boldElement = (Element) boldElements.item(0);
+    assertEquals("Some ${viewer} content", boldElement.getTextContent());
+    
+    NodeList osHtmlElements = scripts.get(0).getElementsByTagNameNS(
+        "http://ns.opensocial.org/2008/markup", "Html");
+    assertEquals(1, osHtmlElements.getLength());
+  }
+
+  @Test
+  public void testSocialTemplateSerialization() {
+    String content = HtmlSerialization.serialize(document);
+    assertTrue("Empty elements not preserved as XML inside template",
+        content.contains("<img/>"));
+  }
+
+  @Test
+  public void testJavascript() {
+    // Verify text content is unmodified in javascript blocks
+    List<Element> scripts = getTags("script");
+    
+    // Remove any OpenSocial-specific nodes.
+    Iterator<Element> scriptIt = scripts.iterator();
+    while (scriptIt.hasNext()) {
+      if (isOpenSocialScript(scriptIt.next())) {
+        scriptIt.remove();
+      }
+    }
+    
+    assertEquals(1, scripts.size());
+    
+    NodeList boldElements = scripts.get(0).getElementsByTagName("b");
+    assertEquals(0, boldElements.getLength());
+
+    String scriptContent = scripts.get(0).getTextContent().trim();
+    assertEquals("<b>Some ${viewer} content</b>", scriptContent);
+  }
+
+  @Test
+  public void testPlainContent() {
+    // Verify text content is preserved in non-script content
+    NodeList spanElements = document.getElementsByTagName("span");
+    assertEquals(1, spanElements.getLength());
+    assertEquals("Some content", spanElements.item(0).getTextContent());
+  }
+
+  @Test
+  public void testCommentOrdering() {
+    NodeList divElements = document.getElementsByTagName("div");
+    assertEquals(1, divElements.getLength());
+    NodeList children = divElements.item(0).getChildNodes();
+    assertEquals(3, children.getLength());
+    
+    // Should be comment/text/comment, not comment/comment/text
+    assertEquals(Node.COMMENT_NODE, children.item(0).getNodeType());
+    assertEquals(Node.TEXT_NODE, children.item(1).getNodeType());
+    assertEquals(Node.COMMENT_NODE, children.item(2).getNodeType());
+  }
+  
+  @Test
+  public void testInvalid() throws Exception {
+    String content =
+        "<html><div id=\"div_super\" class=\"div_super\" valign:\"middle\"></div></html>";
+    try {
+      parser.parseDom(content);
+      fail("No exception caught on invalid character");
+    } catch (DOMException e) {
+      assertTrue(e.getMessage().contains("INVALID_CHARACTER_ERR"));
+      assertTrue(e.getMessage().contains(
+          "Around ...<div id=\"div_super\" class=\"div_super\"..."));
+    } catch (GadgetException e) {
+      assertEquals(GadgetException.Code.HTML_PARSE_ERROR, e.getCode());
+    }
+  }
+
+  private void assertEmpty(Node n) {
+    if (n.getChildNodes().getLength() != 0) {
+      assertTrue(StringUtils.isEmpty(n.getTextContent()) ||
+          StringUtils.isWhitespace(n.getTextContent()));
+    }
+  }
+
+  private List<Element> getTags(String tagName) {
+    NodeList list = document.getElementsByTagName(tagName);
+    List<Element> elements = Lists.newArrayListWithExpectedSize(list.getLength());
+    for (int i = 0; i < list.getLength(); i++) {
+      elements.add((Element) list.item(i));
+    }
+    
+    // Add equivalent <script> elements
+    String scriptType = GadgetHtmlParser.SCRIPT_TYPE_TO_OSML_TAG.inverse().get(tagName);
+    if (scriptType != null) {
+      List<Element> scripts =
+          DomUtil.getElementsByTagNameCaseInsensitive(document, ImmutableSet.of("script"));
+      for (Element script : scripts) {
+        Attr typeAttr = (Attr)script.getAttributes().getNamedItem("type");
+        if (typeAttr != null && scriptType.equalsIgnoreCase(typeAttr.getValue())) {
+          elements.add((Element)script);
+        }
+      }
+    }
+    return elements;
+  }
+  
+  private boolean isOpenSocialScript(Element script) {
+    Attr typeAttr = (Attr)script.getAttributes().getNamedItem("type");
+    return (typeAttr != null && typeAttr.getValue() != null &&
+            GadgetHtmlParser.SCRIPT_TYPE_TO_OSML_TAG.containsKey(typeAttr.getValue()));
+  }
+}

Modified: incubator/shindig/trunk/java/gadgets/src/test/java/org/apache/shindig/gadgets/parse/CompactHtmlSerializerTest.java
URL: http://svn.apache.org/viewvc/incubator/shindig/trunk/java/gadgets/src/test/java/org/apache/shindig/gadgets/parse/CompactHtmlSerializerTest.java?rev=891496&r1=891495&r2=891496&view=diff
==============================================================================
--- incubator/shindig/trunk/java/gadgets/src/test/java/org/apache/shindig/gadgets/parse/CompactHtmlSerializerTest.java (original)
+++ incubator/shindig/trunk/java/gadgets/src/test/java/org/apache/shindig/gadgets/parse/CompactHtmlSerializerTest.java Thu Dec 17 00:48:29 2009
@@ -18,9 +18,11 @@
  */
 package org.apache.shindig.gadgets.parse;
 
-import org.apache.shindig.gadgets.parse.nekohtml.NekoSimplifiedHtmlParser;
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertTrue;
 
 import com.google.inject.Provider;
+
 import org.junit.Before;
 import org.junit.Test;
 
@@ -30,10 +32,11 @@
 /**
  * Test cases for CompactHtmlSerializer.
  */
-public class CompactHtmlSerializerTest extends AbstractParserAndSerializerTest {
+public abstract class CompactHtmlSerializerTest extends AbstractParsingTestBase {
+  
+  protected abstract GadgetHtmlParser makeParser();
 
-  private GadgetHtmlParser full = new NekoSimplifiedHtmlParser(
-      new ParseModule.DOMImplementationProvider().get());
+  private GadgetHtmlParser full = makeParser();
 
   @Before
   public void setUp() throws Exception {
@@ -45,24 +48,24 @@
   }
 
   @Test
-  public void testWhitespaceNotCollapsedInSpecialTags() throws Exception {
+  public void whitespaceNotCollapsedInSpecialTags() throws Exception {
     String content = loadFile(
-        "org/apache/shindig/gadgets/parse/nekohtml/test-with-specialtags.html");
+        "org/apache/shindig/gadgets/parse/test-with-specialtags.html");
     String expected = loadFile(
-        "org/apache/shindig/gadgets/parse/nekohtml/test-with-specialtags-expected.html");
+        "org/apache/shindig/gadgets/parse/test-with-specialtags-expected.html");
     parseAndCompareBalanced(content, expected, full);
   }
-
+  
   @Test
-  public void testIeConditionalCommentNotRemoved() throws Exception {
-    String content = loadFile("org/apache/shindig/gadgets/parse/nekohtml/test-with-iecond-comments.html");
+  public void ieConditionalCommentNotRemoved() throws Exception {
+    String content = loadFile("org/apache/shindig/gadgets/parse/test-with-iecond-comments.html");
     String expected = loadFile(
-        "org/apache/shindig/gadgets/parse/nekohtml/test-with-iecond-comments-expected.html");
+        "org/apache/shindig/gadgets/parse/test-with-iecond-comments-expected.html");
     parseAndCompareBalanced(content, expected, full);
   }
 
   @Test
-  public void testSpecialTagsAreRecognized() {
+  public void specialTagsAreRecognized() {
     assertSpecialTag("textArea");
     assertSpecialTag("scrIpt");
     assertSpecialTag("Style");
@@ -78,7 +81,6 @@
         CompactHtmlSerializer.isSpecialTag(tagName.toLowerCase()));
   }
 
-  @Test
   public void testCollapseHtmlWhitespace() throws IOException {
     assertCollapsed("abc", "abc");
     assertCollapsed("abc ", "abc");
@@ -96,4 +98,4 @@
     CompactHtmlSerializer.collapseWhitespace(input, output);
     assertEquals(expected, output.toString());
   }
-}
\ No newline at end of file
+}

Added: incubator/shindig/trunk/java/gadgets/src/test/java/org/apache/shindig/gadgets/parse/caja/CajaCompactHtmlSerializerTest.java
URL: http://svn.apache.org/viewvc/incubator/shindig/trunk/java/gadgets/src/test/java/org/apache/shindig/gadgets/parse/caja/CajaCompactHtmlSerializerTest.java?rev=891496&view=auto
==============================================================================
--- incubator/shindig/trunk/java/gadgets/src/test/java/org/apache/shindig/gadgets/parse/caja/CajaCompactHtmlSerializerTest.java (added)
+++ incubator/shindig/trunk/java/gadgets/src/test/java/org/apache/shindig/gadgets/parse/caja/CajaCompactHtmlSerializerTest.java Thu Dec 17 00:48:29 2009
@@ -0,0 +1,32 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.shindig.gadgets.parse.caja;
+
+import org.apache.shindig.gadgets.parse.CompactHtmlSerializerTest;
+import org.apache.shindig.gadgets.parse.GadgetHtmlParser;
+import org.apache.shindig.gadgets.parse.ParseModule;
+
+public class CajaCompactHtmlSerializerTest extends CompactHtmlSerializerTest {
+
+  @Override
+  protected GadgetHtmlParser makeParser() {
+    return new CajaHtmlParser(new ParseModule.DOMImplementationProvider().get());
+  }
+
+}

Added: incubator/shindig/trunk/java/gadgets/src/test/java/org/apache/shindig/gadgets/parse/caja/CajaParserAndSerializerTest.java
URL: http://svn.apache.org/viewvc/incubator/shindig/trunk/java/gadgets/src/test/java/org/apache/shindig/gadgets/parse/caja/CajaParserAndSerializerTest.java?rev=891496&view=auto
==============================================================================
--- incubator/shindig/trunk/java/gadgets/src/test/java/org/apache/shindig/gadgets/parse/caja/CajaParserAndSerializerTest.java (added)
+++ incubator/shindig/trunk/java/gadgets/src/test/java/org/apache/shindig/gadgets/parse/caja/CajaParserAndSerializerTest.java Thu Dec 17 00:48:29 2009
@@ -0,0 +1,32 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.shindig.gadgets.parse.caja;
+
+import org.apache.shindig.gadgets.parse.AbstractParserAndSerializerTest;
+import org.apache.shindig.gadgets.parse.GadgetHtmlParser;
+import org.apache.shindig.gadgets.parse.ParseModule;
+
+public class CajaParserAndSerializerTest extends AbstractParserAndSerializerTest {
+
+  @Override
+  protected GadgetHtmlParser makeParser() {
+    return new CajaHtmlParser(new ParseModule.DOMImplementationProvider().get());
+  }
+
+}

Added: incubator/shindig/trunk/java/gadgets/src/test/java/org/apache/shindig/gadgets/parse/caja/CajaSocialMarkupHtmlParserTest.java
URL: http://svn.apache.org/viewvc/incubator/shindig/trunk/java/gadgets/src/test/java/org/apache/shindig/gadgets/parse/caja/CajaSocialMarkupHtmlParserTest.java?rev=891496&view=auto
==============================================================================
--- incubator/shindig/trunk/java/gadgets/src/test/java/org/apache/shindig/gadgets/parse/caja/CajaSocialMarkupHtmlParserTest.java (added)
+++ incubator/shindig/trunk/java/gadgets/src/test/java/org/apache/shindig/gadgets/parse/caja/CajaSocialMarkupHtmlParserTest.java Thu Dec 17 00:48:29 2009
@@ -0,0 +1,32 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.shindig.gadgets.parse.caja;
+
+import org.apache.shindig.gadgets.parse.AbstractSocialMarkupHtmlParserTest;
+import org.apache.shindig.gadgets.parse.GadgetHtmlParser;
+import org.apache.shindig.gadgets.parse.ParseModule;
+
+public class CajaSocialMarkupHtmlParserTest extends AbstractSocialMarkupHtmlParserTest {
+
+  @Override
+  protected GadgetHtmlParser makeParser() {
+    return new CajaHtmlParser(new ParseModule.DOMImplementationProvider().get());
+  }
+
+}

Added: incubator/shindig/trunk/java/gadgets/src/test/java/org/apache/shindig/gadgets/parse/nekohtml/NekoCompactHtmlSerializerTest.java
URL: http://svn.apache.org/viewvc/incubator/shindig/trunk/java/gadgets/src/test/java/org/apache/shindig/gadgets/parse/nekohtml/NekoCompactHtmlSerializerTest.java?rev=891496&view=auto
==============================================================================
--- incubator/shindig/trunk/java/gadgets/src/test/java/org/apache/shindig/gadgets/parse/nekohtml/NekoCompactHtmlSerializerTest.java (added)
+++ incubator/shindig/trunk/java/gadgets/src/test/java/org/apache/shindig/gadgets/parse/nekohtml/NekoCompactHtmlSerializerTest.java Thu Dec 17 00:48:29 2009
@@ -0,0 +1,36 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.shindig.gadgets.parse.nekohtml;
+
+import org.apache.shindig.gadgets.parse.CompactHtmlSerializerTest;
+import org.apache.shindig.gadgets.parse.GadgetHtmlParser;
+import org.apache.shindig.gadgets.parse.ParseModule;
+
+/**
+ * Compact HTML serializer test using the Neko parser implementation.
+ */
+public class NekoCompactHtmlSerializerTest extends CompactHtmlSerializerTest {
+
+  @Override
+  protected GadgetHtmlParser makeParser() {
+    return new NekoSimplifiedHtmlParser(
+        new ParseModule.DOMImplementationProvider().get());
+  }
+
+}

Modified: incubator/shindig/trunk/java/gadgets/src/test/java/org/apache/shindig/gadgets/parse/nekohtml/NekoParserAndSerializeTest.java
URL: http://svn.apache.org/viewvc/incubator/shindig/trunk/java/gadgets/src/test/java/org/apache/shindig/gadgets/parse/nekohtml/NekoParserAndSerializeTest.java?rev=891496&r1=891495&r2=891496&view=diff
==============================================================================
--- incubator/shindig/trunk/java/gadgets/src/test/java/org/apache/shindig/gadgets/parse/nekohtml/NekoParserAndSerializeTest.java (original)
+++ incubator/shindig/trunk/java/gadgets/src/test/java/org/apache/shindig/gadgets/parse/nekohtml/NekoParserAndSerializeTest.java Thu Dec 17 00:48:29 2009
@@ -17,7 +17,10 @@
  */
 package org.apache.shindig.gadgets.parse.nekohtml;
 
+import static org.junit.Assert.assertNull;
+
 import org.apache.shindig.gadgets.parse.AbstractParserAndSerializerTest;
+import org.apache.shindig.gadgets.parse.GadgetHtmlParser;
 import org.apache.shindig.gadgets.parse.ParseModule;
 import org.junit.Test;
 
@@ -25,61 +28,51 @@
  * Test behavior of neko based parser and serializers
  */
 public class NekoParserAndSerializeTest extends AbstractParserAndSerializerTest {
-
-  private NekoSimplifiedHtmlParser simple = new NekoSimplifiedHtmlParser(
+  @Override
+  protected GadgetHtmlParser makeParser() {
+    return new NekoSimplifiedHtmlParser(
         new ParseModule.DOMImplementationProvider().get());
-
-  @Test
-  public void testDocWithDoctype() throws Exception {
-    // Note that doctype is properly retained
-    String content = loadFile("org/apache/shindig/gadgets/parse/nekohtml/test.html");
-    String expected = loadFile("org/apache/shindig/gadgets/parse/nekohtml/test-expected.html");
-    parseAndCompareBalanced(content, expected, simple);
   }
-
+  
+  // Neko-specific tests.
   @Test
-  public void testDocNoDoctype() throws Exception {
-    // Note that no doctype is properly created when none specified
-    String content = loadFile("org/apache/shindig/gadgets/parse/nekohtml/test-fulldocnodoctype.html");
-    assertNull(simple.parseDom(content).getDoctype());
-  }
+  public void scriptPushedToBody() throws Exception {
+    String content = loadFile("org/apache/shindig/gadgets/parse/nekohtml/test-leadingscript.html");
+    String expected =
+        loadFile("org/apache/shindig/gadgets/parse/nekohtml/test-leadingscript-expected.html");
+    parseAndCompareBalanced(content, expected, parser);
+  }  
 
+  // Neko overridden tests (due to Neko quirks)
+  @Override
   @Test
-  public void testNotADocument() throws Exception {
+  public void notADocument() throws Exception {
     // Note that no doctype is injected for fragments
     String content = loadFile("org/apache/shindig/gadgets/parse/nekohtml/test-fragment.html");
     String expected = loadFile("org/apache/shindig/gadgets/parse/nekohtml/test-fragment-expected.html");
-    parseAndCompareBalanced(content, expected, simple);
+    parseAndCompareBalanced(content, expected, parser);
   }
-
+  
+  @Override
   @Test
-  public void testNotADocument2() throws Exception {
-    // Note that no doctype is injected for fragments
-    String content = loadFile("org/apache/shindig/gadgets/parse/nekohtml/test-fragment2.html");
-    String expected = loadFile("org/apache/shindig/gadgets/parse/nekohtml/test-fragment2-expected.html");
-    parseAndCompareBalanced(content, expected, simple);
-  }
-
-  @Test
-  public void testNoBody() throws Exception {
+  public void noBody() throws Exception {
     // Note that no doctype is injected for fragments
     String content = loadFile("org/apache/shindig/gadgets/parse/nekohtml/test-headnobody.html");
     String expected = loadFile("org/apache/shindig/gadgets/parse/nekohtml/test-headnobody-expected.html");
-    parseAndCompareBalanced(content, expected, simple);
+    parseAndCompareBalanced(content, expected, parser);
   }
 
+  // Overridden because of comment vs. script ordering. Neko stuffs script into head, but
+  // postprocessing moves it back down into body, *above* the comment element. This is
+  // semantically meaningless (to HTML), so we create a new test to accommodate it.
+  @Override
   @Test
-  public void testAmpersand() throws Exception {
-    // Note that no doctype is injected for fragments
-    String content = loadFile("org/apache/shindig/gadgets/parse/nekohtml/test-with-ampersands.html");
-    String expected = loadFile("org/apache/shindig/gadgets/parse/nekohtml/test-with-ampersands-expected.html");
-    parseAndCompareBalanced(content, expected, simple);
-  }
-
-  @Test
-  public void testScriptPushedToBody() throws Exception {
-    String content = loadFile("org/apache/shindig/gadgets/parse/nekohtml/test-leadingscript.html");
-    String expected = loadFile("org/apache/shindig/gadgets/parse/nekohtml/test-leadingscript-expected.html");
-    parseAndCompareBalanced(content, expected, simple);
+  public void docNoDoctype() throws Exception {
+    // Note that no doctype is properly created when none specified
+    String content = loadFile("org/apache/shindig/gadgets/parse/test-fulldocnodoctype.html");
+    String expected =
+        loadFile("org/apache/shindig/gadgets/parse/nekohtml/test-fulldocnodoctype-expected.html");
+    assertNull(parser.parseDom(content).getDoctype());
+    parseAndCompareBalanced(content, expected, parser);
   }
 }

Modified: incubator/shindig/trunk/java/gadgets/src/test/java/org/apache/shindig/gadgets/parse/nekohtml/SocialMarkupHtmlParserTest.java
URL: http://svn.apache.org/viewvc/incubator/shindig/trunk/java/gadgets/src/test/java/org/apache/shindig/gadgets/parse/nekohtml/SocialMarkupHtmlParserTest.java?rev=891496&r1=891495&r2=891496&view=diff
==============================================================================
--- incubator/shindig/trunk/java/gadgets/src/test/java/org/apache/shindig/gadgets/parse/nekohtml/SocialMarkupHtmlParserTest.java (original)
+++ incubator/shindig/trunk/java/gadgets/src/test/java/org/apache/shindig/gadgets/parse/nekohtml/SocialMarkupHtmlParserTest.java Thu Dec 17 00:48:29 2009
@@ -18,141 +18,16 @@
  */
 package org.apache.shindig.gadgets.parse.nekohtml;
 
-import org.apache.commons.io.IOUtils;
-import org.apache.commons.lang.StringUtils;
+import org.apache.shindig.gadgets.parse.AbstractSocialMarkupHtmlParserTest;
 import org.apache.shindig.gadgets.parse.GadgetHtmlParser;
-import org.apache.shindig.gadgets.parse.HtmlSerialization;
 import org.apache.shindig.gadgets.parse.ParseModule;
-import org.apache.shindig.gadgets.spec.PipelinedData;
-
-import com.google.common.collect.Lists;
-import static org.junit.Assert.assertEquals;
-import static org.junit.Assert.assertTrue;
-import static org.junit.Assert.fail;
-
-import org.junit.Before;
-import org.junit.Test;
-import org.w3c.dom.DOMException;
-import org.w3c.dom.Document;
-import org.w3c.dom.Element;
-import org.w3c.dom.Node;
-import org.w3c.dom.NodeList;
-
-import java.util.List;
 
 /**
  * Test for the social markup parser.
  */
-public class SocialMarkupHtmlParserTest {
-  private GadgetHtmlParser parser;
-  private Document document;
-
-  @Before
-  public void setUp() throws Exception {
-    parser = new NekoSimplifiedHtmlParser(new ParseModule.DOMImplementationProvider().get());
-
-    String content = IOUtils.toString(this.getClass().getClassLoader().
-        getResourceAsStream("org/apache/shindig/gadgets/parse/nekohtml/test-socialmarkup.html"));
-    document = parser.parseDom(content);
-  }
-
-  @Test
-  public void testSocialData() {
-    // Verify elements are preserved in social data
-    List<Element> scripts = getTags(GadgetHtmlParser.OSML_DATA_TAG);
-    assertEquals(1, scripts.size());
-    
-    NodeList viewerRequests = scripts.get(0).getElementsByTagNameNS(
-        PipelinedData.OPENSOCIAL_NAMESPACE, "ViewerRequest");
-    assertEquals(1, viewerRequests.getLength());
-    Element viewerRequest = (Element) viewerRequests.item(0);
-    assertEquals("viewer", viewerRequest.getAttribute("key"));
-    assertEmpty(viewerRequest);
-  }
-
-  @Test
-  public void testSocialTemplate() {
-    // Verify elements and text content are preserved in social templates
-    List<Element> scripts = getTags(GadgetHtmlParser.OSML_TEMPLATE_TAG);
-    assertEquals(1, scripts.size());
-    
-    NodeList boldElements = scripts.get(0).getElementsByTagName("b");
-    assertEquals(1, boldElements.getLength());
-    Element boldElement = (Element) boldElements.item(0);
-    assertEquals("Some ${viewer} content", boldElement.getTextContent());
-    
-    NodeList osHtmlElements = scripts.get(0).getElementsByTagNameNS(
-        "http://ns.opensocial.org/2008/markup", "Html");
-    assertEquals(1, osHtmlElements.getLength());
-  }
-
-  @Test
-  public void testSocialTemplateSerialization() {
-    String content = HtmlSerialization.serialize(document);
-    assertTrue("Empty elements not preserved as XML inside template",
-        content.contains("<img/>"));
-  }
-
-  @Test
-  public void testJavascript() {
-    // Verify text content is unmodified in javascript blocks
-    List<Element> scripts = getTags("script");
-    assertEquals(1, scripts.size());
-    
-    NodeList boldElements = scripts.get(0).getElementsByTagName("b");
-    assertEquals(0, boldElements.getLength());
-
-    String scriptContent = scripts.get(0).getTextContent().trim();
-    assertEquals("<b>Some ${viewer} content</b>", scriptContent);
-  }
-
-  @Test
-  public void testPlainContent() {
-    // Verify text content is preserved in non-script content
-    NodeList spanElements = document.getElementsByTagName("span");
-    assertEquals(1, spanElements.getLength());
-    assertEquals("Some content", spanElements.item(0).getTextContent());
-  }
-
-  @Test
-  public void testCommentOrdering() {
-    NodeList divElements = document.getElementsByTagName("div");
-    assertEquals(1, divElements.getLength());
-    NodeList children = divElements.item(0).getChildNodes();
-    assertEquals(3, children.getLength());
-    
-    // Should be comment/text/comment, not comment/comment/text
-    assertEquals(Node.COMMENT_NODE, children.item(0).getNodeType());
-    assertEquals(Node.TEXT_NODE, children.item(1).getNodeType());
-    assertEquals(Node.COMMENT_NODE, children.item(2).getNodeType());
-  }
-  
-  @Test
-  public void testInvalid() throws Exception {
-    String content = "<html><div id=\"div_super\" class=\"div_super\" valign:\"middle\"></div></html>";
-    try {
-      parser.parseDom(content);
-      fail("No exception caught");
-    } catch (DOMException e) {
-      assertTrue(e.getMessage().contains("INVALID_CHARACTER_ERR"));
-      assertTrue(e.getMessage().contains(
-          "Around ...<div id=\"div_super\" class=\"div_super\"..."));
-    }
-  }
-
-  private void assertEmpty(Node n) {
-    if (n.getChildNodes().getLength() != 0) {
-      assertTrue(StringUtils.isEmpty(n.getTextContent()) ||
-          StringUtils.isWhitespace(n.getTextContent()));
-    }
-  }
-
-  private List<Element> getTags(String tagName) {
-    NodeList list = document.getElementsByTagName(tagName);
-    List<Element> elements = Lists.newArrayListWithExpectedSize(list.getLength());
-    for (int i = 0; i < list.getLength(); i++) {
-      elements.add((Element) list.item(i));
-    }
-    return elements;
+public class SocialMarkupHtmlParserTest extends AbstractSocialMarkupHtmlParserTest {
+  @Override
+  protected GadgetHtmlParser makeParser() {
+    return new NekoSimplifiedHtmlParser(new ParseModule.DOMImplementationProvider().get());
   }
 }

Added: incubator/shindig/trunk/java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/nekohtml/test-fulldocnodoctype-expected.html
URL: http://svn.apache.org/viewvc/incubator/shindig/trunk/java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/nekohtml/test-fulldocnodoctype-expected.html?rev=891496&view=auto
==============================================================================
--- incubator/shindig/trunk/java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/nekohtml/test-fulldocnodoctype-expected.html (added)
+++ incubator/shindig/trunk/java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/nekohtml/test-fulldocnodoctype-expected.html Thu Dec 17 00:48:29 2009
@@ -0,0 +1,8 @@
+<html>
+  <head><style>CSS</style></head>
+  <body>
+  <script>function foo(){}</script>
+  <!-- This is a full doc with no doctype -->
+  <div id="mydiv">DIV</div>
+  </body>
+</html>
\ No newline at end of file

Modified: incubator/shindig/trunk/java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/nekohtml/test-leadingscript-expected.html
URL: http://svn.apache.org/viewvc/incubator/shindig/trunk/java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/nekohtml/test-leadingscript-expected.html?rev=891496&r1=891495&r2=891496&view=diff
==============================================================================
--- incubator/shindig/trunk/java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/nekohtml/test-leadingscript-expected.html (original)
+++ incubator/shindig/trunk/java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/nekohtml/test-leadingscript-expected.html Thu Dec 17 00:48:29 2009
@@ -3,5 +3,9 @@
 
 <link rel="linkrel">
 
-</head><body><script>foo1();</script><script>foo2();</script><script>foo3();</script><div id="mydiv">mycontent</div>
+</head><body>
+<script>foo1();</script>
+<script>foo2();</script>
+<script>foo3();</script>
+<div id="mydiv">mycontent</div>
 </body></html>

Added: incubator/shindig/trunk/java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/test-expected.html
URL: http://svn.apache.org/viewvc/incubator/shindig/trunk/java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/test-expected.html?rev=891496&view=auto
==============================================================================
--- incubator/shindig/trunk/java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/test-expected.html (added)
+++ incubator/shindig/trunk/java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/test-expected.html Thu Dec 17 00:48:29 2009
@@ -0,0 +1,23 @@
+<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/loose.dtd">
+<html><head id="head">
+  <link href="http://www.example.org/css.css" rel="stylesheet" type="text/css">
+  <title>An example</title>
+</head><body>
+  <!-- Some comment -->
+    <script type="text/javascript">document.write("&&&")</script>
+    <script src="http://www.example.org/1.js" type="text/javascript"></script>
+    <div>
+      <table><TBODY><tr><td>a cell</td></tr></TBODY></table>
+    </div>
+    <p>Lorem ipsum</p>
+    <a href="/test.html" title="">link</a>
+    <form action="/test/submit">
+      <div>
+        <input type="hidden" value="something">
+        <input type="text">
+      </div>
+      <div><-- An unbalanced tag we dont care about -->
+      <p>Some entities &amp;#x27;&quot;</p>
+      <p>Not a real entity &fake;</p>
+    </div></form>
+</body></html>
\ No newline at end of file

Added: incubator/shindig/trunk/java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/test-fragment-expected.html
URL: http://svn.apache.org/viewvc/incubator/shindig/trunk/java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/test-fragment-expected.html?rev=891496&view=auto
==============================================================================
--- incubator/shindig/trunk/java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/test-fragment-expected.html (added)
+++ incubator/shindig/trunk/java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/test-fragment-expected.html Thu Dec 17 00:48:29 2009
@@ -0,0 +1,2 @@
+<html><head>
+<style type="text/css"> A { font : bold; }</style></head><body><script>document.write("dont add to head or else")</script></body></html>
\ No newline at end of file

Added: incubator/shindig/trunk/java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/test-fragment.html
URL: http://svn.apache.org/viewvc/incubator/shindig/trunk/java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/test-fragment.html?rev=891496&view=auto
==============================================================================
--- incubator/shindig/trunk/java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/test-fragment.html (added)
+++ incubator/shindig/trunk/java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/test-fragment.html Thu Dec 17 00:48:29 2009
@@ -0,0 +1,2 @@
+<script>document.write("dont add to head or else")</script>
+<style type="text/css"> A { font : bold; }</style>
\ No newline at end of file

Added: incubator/shindig/trunk/java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/test-fragment2-expected.html
URL: http://svn.apache.org/viewvc/incubator/shindig/trunk/java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/test-fragment2-expected.html?rev=891496&view=auto
==============================================================================
--- incubator/shindig/trunk/java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/test-fragment2-expected.html (added)
+++ incubator/shindig/trunk/java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/test-fragment2-expected.html Thu Dec 17 00:48:29 2009
@@ -0,0 +1,2 @@
+<html><head><style type="text/css"> A { background-color : #7f7f7f; } </style>
+</head><body><div>A div</div></body></html>
\ No newline at end of file

Added: incubator/shindig/trunk/java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/test-fragment2.html
URL: http://svn.apache.org/viewvc/incubator/shindig/trunk/java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/test-fragment2.html?rev=891496&view=auto
==============================================================================
--- incubator/shindig/trunk/java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/test-fragment2.html (added)
+++ incubator/shindig/trunk/java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/test-fragment2.html Thu Dec 17 00:48:29 2009
@@ -0,0 +1,2 @@
+<style type="text/css"> A { background-color : #7f7f7f; } </style>
+<div>A div</div>
\ No newline at end of file

Added: incubator/shindig/trunk/java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/test-fulldocnodoctype-expected.html
URL: http://svn.apache.org/viewvc/incubator/shindig/trunk/java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/test-fulldocnodoctype-expected.html?rev=891496&view=auto
==============================================================================
--- incubator/shindig/trunk/java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/test-fulldocnodoctype-expected.html (added)
+++ incubator/shindig/trunk/java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/test-fulldocnodoctype-expected.html Thu Dec 17 00:48:29 2009
@@ -0,0 +1,7 @@
+<html>
+  <head><style>CSS</style><script>function foo(){}</script></head>
+  <body>
+  <!-- This is a full doc with no doctype -->
+  <div id="mydiv">DIV</div>
+  </body>
+</html>
\ No newline at end of file

Added: incubator/shindig/trunk/java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/test-fulldocnodoctype.html
URL: http://svn.apache.org/viewvc/incubator/shindig/trunk/java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/test-fulldocnodoctype.html?rev=891496&view=auto
==============================================================================
--- incubator/shindig/trunk/java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/test-fulldocnodoctype.html (added)
+++ incubator/shindig/trunk/java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/test-fulldocnodoctype.html Thu Dec 17 00:48:29 2009
@@ -0,0 +1,7 @@
+<html>
+  <head><style>CSS</style><script>function foo(){}</script></head>
+  <body>
+  <!-- This is a full doc with no doctype -->
+  <div id="mydiv">DIV</div>
+  </body>
+</html>
\ No newline at end of file

Added: incubator/shindig/trunk/java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/test-headnobody-expected.html
URL: http://svn.apache.org/viewvc/incubator/shindig/trunk/java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/test-headnobody-expected.html?rev=891496&view=auto
==============================================================================
--- incubator/shindig/trunk/java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/test-headnobody-expected.html (added)
+++ incubator/shindig/trunk/java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/test-headnobody-expected.html Thu Dec 17 00:48:29 2009
@@ -0,0 +1,5 @@
+<html><head><style type="text/css"> A { font : bold; } </style></head><body>
+    <!-- A head tag but no body tag is not good -->
+<script>document.write("dont add to head or else")</script>
+
+</body></html>
\ No newline at end of file

Added: incubator/shindig/trunk/java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/test-headnobody.html
URL: http://svn.apache.org/viewvc/incubator/shindig/trunk/java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/test-headnobody.html?rev=891496&view=auto
==============================================================================
--- incubator/shindig/trunk/java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/test-headnobody.html (added)
+++ incubator/shindig/trunk/java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/test-headnobody.html Thu Dec 17 00:48:29 2009
@@ -0,0 +1,5 @@
+<head>
+    <!-- A head tag but no body tag is not good -->
+</head>
+<script>document.write("dont add to head or else")</script>
+<style type="text/css"> A { font : bold; } </style>
\ No newline at end of file

Added: incubator/shindig/trunk/java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/test-socialmarkup.html
URL: http://svn.apache.org/viewvc/incubator/shindig/trunk/java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/test-socialmarkup.html?rev=891496&view=auto
==============================================================================
--- incubator/shindig/trunk/java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/test-socialmarkup.html (added)
+++ incubator/shindig/trunk/java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/test-socialmarkup.html Thu Dec 17 00:48:29 2009
@@ -0,0 +1,19 @@
+<link rel="foo"></link>
+
+<script type="text/os-data" xmlns:os="http://ns.opensocial.org/2008/markup">
+  <os:ViewerRequest key="viewer"/>
+</script>
+
+<script id="template-id" name="template-name" type="text/os-template" tag="template-tag" xmlns:os="http://ns.opensocial.org/2008/markup">
+  <b>Some ${viewer} content</b>
+  <img/>
+  <os:Html/>
+</script>
+
+<script type="text/javascript">
+  <b>Some ${viewer} content</b>
+</script>
+
+<span>Some content</span>
+
+<div><!-- foo -->bar<!-- baz --></div>

Added: incubator/shindig/trunk/java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/test-with-ampersands-expected.html
URL: http://svn.apache.org/viewvc/incubator/shindig/trunk/java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/test-with-ampersands-expected.html?rev=891496&view=auto
==============================================================================
--- incubator/shindig/trunk/java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/test-with-ampersands-expected.html (added)
+++ incubator/shindig/trunk/java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/test-with-ampersands-expected.html Thu Dec 17 00:48:29 2009
@@ -0,0 +1,8 @@
+<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/loose.dtd">
+<html><head id="head">
+  <link href="http://www.example.org/css.css" rel="stylesheet" type="text/css">
+  <title>An example</title>
+</head><body>
+  <!-- Some comment -->
+  <span title="&amp;lt;">content</span>
+</body></html>
\ No newline at end of file

Added: incubator/shindig/trunk/java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/test-with-ampersands.html
URL: http://svn.apache.org/viewvc/incubator/shindig/trunk/java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/test-with-ampersands.html?rev=891496&view=auto
==============================================================================
--- incubator/shindig/trunk/java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/test-with-ampersands.html (added)
+++ incubator/shindig/trunk/java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/test-with-ampersands.html Thu Dec 17 00:48:29 2009
@@ -0,0 +1,11 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/loose.dtd">
+<html>
+<head id="head">
+  <link href="http://www.example.org/css.css" rel="stylesheet" type="text/css">
+  <title>An example</title>
+</head>
+<body>
+  <!-- Some comment -->
+  <span title="&amp;lt;">content</span>
+</body>
+</html>
\ No newline at end of file

Added: incubator/shindig/trunk/java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/test-with-iecond-comments-expected.html
URL: http://svn.apache.org/viewvc/incubator/shindig/trunk/java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/test-with-iecond-comments-expected.html?rev=891496&view=auto
==============================================================================
--- incubator/shindig/trunk/java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/test-with-iecond-comments-expected.html (added)
+++ incubator/shindig/trunk/java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/test-with-iecond-comments-expected.html Thu Dec 17 00:48:29 2009
@@ -0,0 +1,4 @@
+<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/loose.dtd">
+<html><head id="head"><link href="http://www.example.org/css.css" rel="stylesheet" type="text/css"><title>An example</title></head><body><!--[if IE 5]>
+  <p>Welcome to Internet Explorer 5.</p>
+  <![endif]--><!--[if IE]><p>You are using Internet Explorer.</p><![endif]--><!--[if !IE]><p>You are not using Internet Explorer.</p><![endif]--><!--[if IE 7]><p>Welcome to Internet Explorer 7!</p><![endif]--><!--[if !(IE 7)]><p>You are not using version 7.</p><![endif]--><!--[if gte IE 7]><p>You are using IE 7 or greater.</p><![endif]--><!--[if (IE 5)]><p>You are using IE 5 (any version).</p><![endif]--><!--[if (gte IE 5.5)&(lt IE 7)]><p>You are using IE 5.5 or IE 6.</p><![endif]--><!--[if lt IE 5.5]><p>Please upgrade your version of Internet Explorer.</p><![endif]--><!--[if true]>You are using an <em>uplevel</em> browser.<![endif]--><!--[if false]>You are using a <em>downlevel</em> browser.<![endif]--><!--[if true]><![if IE 7]><p>This nested comment is displayed in IE 7.</p><![endif]><![endif]--></body></html>
\ No newline at end of file

Added: incubator/shindig/trunk/java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/test-with-iecond-comments.html
URL: http://svn.apache.org/viewvc/incubator/shindig/trunk/java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/test-with-iecond-comments.html?rev=891496&view=auto
==============================================================================
--- incubator/shindig/trunk/java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/test-with-iecond-comments.html (added)
+++ incubator/shindig/trunk/java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/test-with-iecond-comments.html Thu Dec 17 00:48:29 2009
@@ -0,0 +1,30 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/loose.dtd">
+<html>
+<head id="head">
+  <link href="http://www.example.org/css.css" rel="stylesheet" type="text/css">
+  <title>An example</title>
+</head>
+<body>
+  <!--[if IE 5]>
+  <p>Welcome to Internet Explorer 5.</p>
+  <![endif]-->
+
+  <!--[if IE]><p>You are using Internet Explorer.</p><![endif]-->
+  <!--[if !IE]><p>You are not using Internet Explorer.</p><![endif]-->
+  
+  <!--[if IE 7]><p>Welcome to Internet Explorer 7!</p><![endif]-->
+  <!--[if !(IE 7)]><p>You are not using version 7.</p><![endif]-->
+  
+  <!--[if gte IE 7]><p>You are using IE 7 or greater.</p><![endif]-->
+  <!--[if (IE 5)]><p>You are using IE 5 (any version).</p><![endif]-->
+  <!--[if (gte IE 5.5)&(lt IE 7)]><p>You are using IE 5.5 or IE 6.</p><![endif]-->
+  <!--[if lt IE 5.5]><p>Please upgrade your version of Internet Explorer.</p><![endif]-->
+  
+  <!--[if true]>You are using an <em>uplevel</em> browser.<![endif]-->
+  <!--[if false]>You are using a <em>downlevel</em> browser.<![endif]-->
+  
+  <!--[if true]><![if IE 7]><p>This nested comment is displayed in IE 7.</p><![endif]><![endif]-->
+
+  <!-- this standard comment should be removed -->
+</body>
+</html>
\ No newline at end of file

Added: incubator/shindig/trunk/java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/test-with-specialtags-expected.html
URL: http://svn.apache.org/viewvc/incubator/shindig/trunk/java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/test-with-specialtags-expected.html?rev=891496&view=auto
==============================================================================
--- incubator/shindig/trunk/java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/test-with-specialtags-expected.html (added)
+++ incubator/shindig/trunk/java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/test-with-specialtags-expected.html Thu Dec 17 00:48:29 2009
@@ -0,0 +1,33 @@
+<html><head><title>An example</title><style type="text/css">
+  <!--
+  #mymap #header {
+    background: #FF9700;
+    clear: both;
+    padding: 2px 0 1px;
+    position: relative;
+    width: 640px;
+  }
+
+  -->
+</style></head><body><script type="text/javascript">document.write("&&&")</script><script src="http://www.example.org/1.js" type="text/javascript"></script><script>
+  // scripts with no old comment hack should be preserved.
+  function a1() {
+    var v1 = 0;
+    alert(" this whitespace should be preserved.");
+  }
+</script><div><table><TBODY><tr><td>a cell</td></tr></TBODY></table></div><script type="text/javascript">
+  <!--
+  // script with old comment hack should be preserved.
+  function MM_goToURL() {
+    var i, args = MM_goToURL.arguments;
+    document.MM_returnValue = false;
+    for (i = 0; i < (args.length - 1); i += 2) eval(args[i] + ".location='" + args[i + 1] + "'");
+  }
+  //-->
+</script><p>Lorem ipsum</p><a href="/test.html" title="">link</a><pre>
+ This is a preformatted block of text,
+ and whitespaces should be preserved.
+ </pre><form action="/test/submit"><div><input type="hidden" value="something"><input type="text"><textarea>
+      This is a preformatted block of text,
+      and whitespaces should be preserved too.
+    </textarea></div></form></body></html>
\ No newline at end of file