You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-dev@xerces.apache.org by Steffen <Gl...@gmx.net> on 2001/03/20 18:37:50 UTC

plz help on parsing xhtml

Hi all,

since nobody replied to my posting some days ago, i will post
a specific code example that gives me a headache, i hope anyone
can explain this behaviour to me, i cant.

I use xerces 1.3.1 to parse xhtml with the following code:

public static void main (java.lang.String[] args) {

String xhtmlString= "<!DOCTYPE HTML PUBLIC \"-//W3C//DTD XHTML 1.0
Strict//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd\">
<html xmlns=\"http://www.w3.org/1999/xhtml\">  <head>    <title>Virtual
Library</title>  </head>  <body>    <p>Moved to ae&auml;    <a
href=\"http://vlib.org/\">vlib.org</a>    .</p>  </body></html>";

  ByteArrayInputStream origStream = new
ByteArrayInputStream(xhtmlString.getBytes());
 InputSource origInput = new InputSource(origStream);

 DOMParser domParser = new DOMParser();

 try {
     //
domParser.setFeature("http://apache.org/xml/features/validation/dynamic",true);


domParser.setProperty("http://apache.org/xml/properties/dom/document-class-name",

      "org.apache.html.dom.HTMLDocumentImpl");
     //
domParser.setFeature("http://apache.org/xml/features/dom/include-ignorable-whitespace",
false);
     domParser.setFeature("http://xml.org/sax/features/validation",
true);

 } catch (Exception e) {
     System.out.println("error in setting up parser
property"+e.getMessage());
 }

org.w3c.dom.Document htmlDocument = null;

try {
     domParser.parse(origInput);
     System.out.println("Parse Success");
 }
 catch (Exception e) {
     System.out.println("Exception :"+e.getMessage());
     //e.printStackTrace();

 }
}

i get the Exception :The attribute type is required in the declaration
of attribute "events" for element "html".

I know that the xhtmlString is  correct, its taken from the xhtml RFC
from w3.org.
The EntityResolver retrieves the DTDs specified in the DOCType from the
Web,
so they should be correct  too.

I assume the exception is thrown while parsing the DTDs, but the Message
is very strange,
because in the Declaration for element "html":

....
<!ELEMENT html (head, body)>
<!ATTLIST html
  %i18n;
  xmlns       %URI;          #FIXED 'http://www.w3.org/1999/xhtml'
  >
...

there is not even an attribute "events".

What am i doing wrong, is there a bug in xerces, when parisng the xhtml
DTD (cant believe it) ?

in the method where the exception is thrown, there is a comment,
in org.apache.xerces.framework.XMLDTDScanner.scanAttlistDecl()

 ...
   decreaseMarkupDepth();
   return;
  }
  // REVISIT - review this code...
  if (!sawSpace) {
   if (fEntityReader.lookingAtSpace(true)) {
    fEntityReader.skipPastSpaces();
   }
.....


plz Help me in my confusion,
thanks

Steffen Glomb


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org