You are viewing a plain text version of this content. The canonical link for it is here.

Posted to general@xml.apache.org by "Andrews, Scott" <An...@ctcnsc.org> on 2004/01/13 15:04:55 UTC

How do I parse a DTD in Java?

How do I parse a DTD into an in-memory Java object, like a TreeMap or
perhaps some XML specific collection class?

 

I asked this question the other day, and got an answer that the
DocumentBuilder parse method should handle the parsing of a DTD - since
a DTD IS XML.

 

However, I get basic parsing errors when inputting a simple DTD.  The
code works fine on XML documents, but not on DTDs.  The code I'm using
to parse the DTD looks like this:

 

     public static void main( String argArgs[] ) {

     {

            File dtdFile = new File( "C:\\APIS\\WorkSpace\\tv.dtd" );

            DocumentBuilderFactory dbf =
DocumentBuilderFactory.newInstance();

            DocumentBuilder db = dbf.newDocumentBuilder();            

            Document document = db.parse( dtdFile );            

            parseChildrenRecursivly( document.getChildNodes();  );

     }

    

     public void parseChildrenRecursivly( NodeList argNodeList ) {

 

            if (argNodeList == null) {

                return;

            }

            

            Node node;

            for (int i=0; i<argNodeList.getLength(); i++) {

                 node = argNodeList.item( i );

                 if (node.getNodeType() != Node.TEXT_NODE) {

                     System.out.println( 

                        "node.nodeName = " + node.getNodeName() + "; " +


                        "node.nodeType = " + Short.toString(
node.getNodeType() ) + "; " +                    

                        "node.localName = " + node.getLocalName() + "; "
+

                        "node.namespaceUri = " + node.getNamespaceURI()
+ "; " +

                        "node.nodeValue = " + node.getNodeValue() + "; "
+

                        ""

                     );

                     parseChildrenRecursivly( node.getChildNodes() );

                 }

            } // for        

     }

 

However, I get errors when making the attempt:

 

[Fatal Error] :-1:-1: Premature end of file.

ERR:> Exception Premature end of file.

 

The DTD I'm trying to parse is just an example.  It looks like this,
where the elements are embedded inside the DOCTYPE tag:

 

<!DOCTYPE TVSCHEDULE [ 

 

<!ELEMENT TVSCHEDULE (CHANNEL+)>

<!ELEMENT CHANNEL (BANNER, DAY+)>

<!ELEMENT BANNER (#PCDATA)>

<!ELEMENT DAY ((DATE, HOLIDAY) | (DATE, PROGRAMSLOT+))+>

<!ELEMENT HOLIDAY (#PCDATA)>

<!ELEMENT DATE (#PCDATA)>

<!ELEMENT PROGRAMSLOT (TIME, TITLE, DESCRIPTION?)>

<!ELEMENT TIME (#PCDATA)>

<!ELEMENT TITLE (#PCDATA)> 

<!ELEMENT DESCRIPTION (#PCDATA)>

 

<!ATTLIST TVSCHEDULE NAME CDATA #REQUIRED>

<!ATTLIST CHANNEL CHAN CDATA #REQUIRED>

<!ATTLIST PROGRAMSLOT VTR CDATA #IMPLIED>

<!ATTLIST TITLE RATING CDATA #IMPLIED>

<!ATTLIST TITLE LANGUAGE CDATA #IMPLIED>

 

]>

 

If I just parse the ELEMENTS, by removing the DOCTYPE tag, I still get
errors:

 

Exception The markup in the document preceding the root element must be
well-formed.

[Fatal Error] tv.dtd:3:3: The markup in the document preceding the root
element must be well-formed.

 

Anybody have a clue how to parse a DTD, so I can get an in-memory
structure of the DTD in Java?

 

 

  _____  

Scott Andrews

Principle Software Engineer

Concurrent Technologies Corporation

(814) 269 6580 (Monday, Wednesday, Friday)

(814) 632 9559 (Tuesday, Thursday)

(814) 880 8522 (Cell)

Re: How do I parse a DTD in Java?

Posted by Anne Thomas Manes <an...@manes.net>.

That's because a DTD is NOT XML.

At 09:54 AM 1/13/2004, you wrote:
>How do I parse a DTD into an in-memory Java object, like a TreeMap or 
>perhaps some XML specific collection class?
>
>I asked this question the other day, and got an answer that the 
>DocumentBuilder parse method should handle the parsing of a DTD  since a 
>DTD IS XML.
>
>However, I get basic parsing errors when inputting a simple DTD.  The code 
>works fine on XML documents, but not on DTDs.  The code Im using to parse 
>the DTD looks like this:
>
>      public static void main( String argArgs[] ) {
>      {
>             File dtdFile = new File( "C:\\APIS\\WorkSpace\\tv.dtd" );
>             DocumentBuilderFactory dbf = 
> DocumentBuilderFactory.newInstance();
>             DocumentBuilder db = dbf.newDocumentBuilder();
>             Document document = db.parse( dtdFile );
>             parseChildrenRecursivly( document.getChildNodes();  );
>      }
>
>      public void parseChildrenRecursivly( NodeList argNodeList ) {
>
>             if (argNodeList == null) {
>                 return;
>             }
>
>             Node node;
>             for (int i=0; i<argNodeList.getLength(); i++) {
>                  node = argNodeList.item( i );
>                  if (node.getNodeType() != Node.TEXT_NODE) {
>                      System.out.println(
>                         "node.nodeName = " + node.getNodeName() + "; " 
> +
>                         "node.nodeType = " + Short.toString( 
> node.getNodeType() ) + "; " +
>                         "node.localName = " + node.getLocalName() + "; " +
>                         "node.namespaceUri = " + node.getNamespaceURI() + 
> "; " +
>                         "node.nodeValue = " + node.getNodeValue() + "; " +
>                         ""
>                      );
>                      parseChildrenRecursivly( node.getChildNodes() );
>                  }
>             } // for
>      }
>
>However, I get errors when making the attempt:
>
>[Fatal Error] :-1:-1: Premature end of file.
>ERR:> Exception Premature end of file.
>
>The DTD Im trying to parse is just an example.  It looks like this, where 
>the elements are embedded inside the DOCTYPE tag:
>
><!DOCTYPE TVSCHEDULE [
>
><!ELEMENT TVSCHEDULE (CHANNEL+)>
><!ELEMENT CHANNEL (BANNER, DAY+)>
><!ELEMENT BANNER (#PCDATA)>
><!ELEMENT DAY ((DATE, HOLIDAY) | (DATE, PROGRAMSLOT+))+>
><!ELEMENT HOLIDAY (#PCDATA)>
><!ELEMENT DATE (#PCDATA)>
><!ELEMENT PROGRAMSLOT (TIME, TITLE, DESCRIPTION?)>
><!ELEMENT TIME (#PCDATA)>
><!ELEMENT TITLE (#PCDATA)>
><!ELEMENT DESCRIPTION (#PCDATA)>
>
><!ATTLIST TVSCHEDULE NAME CDATA #REQUIRED>
><!ATTLIST CHANNEL CHAN CDATA #REQUIRED>
><!ATTLIST PROGRAMSLOT VTR CDATA #IMPLIED>
><!ATTLIST TITLE RATING CDATA #IMPLIED>
><!ATTLIST TITLE LANGUAGE CDATA #IMPLIED>
>
>]>
>
>If I just parse the ELEMENTS, by removing the DOCTYPE tag, I still get errors:
>
>Exception The markup in the document preceding the root element must be 
>well-formed.
>[Fatal Error] tv.dtd:3:3: The markup in the document preceding the root 
>element must be well-formed.
>
>Anybody have a clue how to parse a DTD, so I can get an in-memory 
>structure of the DTD in Java?
>
>
>
>----------
>Scott Andrews
>Principle Software Engineer
>Concurrent Technologies Corporation
>(814) 269 6580 (Monday, Wednesday, Friday)
>(814) 632 9559 (Tuesday, Thursday)
>(814) 880 8522 (Cell)
>
>



---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@xml.apache.org
For additional commands, e-mail: general-help@xml.apache.org