You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-users@xerces.apache.org by Florian Roth <fl...@synatic.net> on 2004/12/16 16:19:19 UTC

Parse DTD

Hi,

I only want to parse a DTD without validating an XML document against it. I 
want to know what attributes a certain element can contain so I digged a bit 
in the Xerces doc but found nothing that could satisfy my needs. Any ideas?

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org


Re: Parse DTD

Posted by Martin Vysny <vy...@host.sk>.
Florian Roth wrote:

>Hi,
>
>I only want to parse a DTD without validating an XML document against it. I 
>want to know what attributes a certain element can contain so I digged a bit 
>in the Xerces doc but found nothing that could satisfy my needs. Any ideas?
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
>For additional commands, e-mail: xerces-j-user-help@xml.apache.org
>  
>
I am using this code when parsing XML and I want to catch all pure-text 
entities. Hope that helps. Just use it as a normal parser.

    /**
     * We are going to parse the XML via Xerces's XNI methods. This is the
     * instance of DOMParser used for parsing documents. With its help 
we are
     * going to catch all entity declarations correctly.
     */
    private class Parser extends DOMParser {
        /**
         * Contains mapping between the entity name and entity value. Maps
         * String to String.
         */
        public final Map<String, String> textEntities = new 
HashMap<String, String>();
        public Parser(URLDir root) {
            super();
            try {
                setFeature(Constants.SAX_FEATURE_PREFIX
                        + Constants.VALIDATION_FEATURE, false);
                // "namespaceAware" == SAX Namespaces feature
                setFeature(Constants.SAX_FEATURE_PREFIX
                        + Constants.NAMESPACES_FEATURE, true);
                // Set various parameters obtained from 
DocumentBuilderFactory
                setFeature(Constants.XERCES_FEATURE_PREFIX
                        + Constants.INCLUDE_IGNORABLE_WHITESPACE, false);
                setFeature(Constants.XERCES_FEATURE_PREFIX
                        + Constants.CREATE_ENTITY_REF_NODES_FEATURE, true);
                setFeature(Constants.XERCES_FEATURE_PREFIX
                        + Constants.INCLUDE_COMMENTS_FEATURE, true);
                setFeature(Constants.XERCES_FEATURE_PREFIX
                        + Constants.CREATE_CDATA_NODES_FEATURE, false);
            } catch (SAXNotRecognizedException ex) {
                throw new Error(ex);
            } catch (SAXNotSupportedException ex) {
                throw new Error(ex);
            }
        }
        /*
         * (non-Javadoc)
         * @see org.apache.xerces.parsers.XMLParser#reset()
         */
        public void reset() throws XNIException {
            textEntities.clear();
            super.reset();
        }
        /*
         * (non-Javadoc)
         * @see 
org.apache.xerces.xni.XMLDTDHandler#internalEntityDecl(java.lang.String,
         * org.apache.xerces.xni.XMLString, org.apache.xerces.xni.XMLString,
         * org.apache.xerces.xni.Augmentations)
         */
        public void internalEntityDecl(String name, XMLString text,
                XMLString nonNormalizedText, Augmentations augs)
                throws XNIException {
            if (name.charAt(0) != '%') {
                // textual entity. store it into the map. Intern them 
because
                // they may be created multiple times in the document.
                textEntities.put(name.intern(), text.toString().intern());
            }
            super.internalEntityDecl(name, text, nonNormalizedText, augs);
        }
    }
}

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org


Re: Parse DTD

Posted by Andy Clark <an...@cyberneko.net>.
Florian Roth wrote:
> Thanks for your help.
> That's exactly what I wanted, but the generated xml looks very 
> confusing ... :)

Not so confusing if you are able to read the DTD definition.
Check out the data/dtd/dtdx.dtd grammar file included with the
package for complete details.

> So I have to get the needed information out of it now ... OK, it's just one 
> week until vacations, so I should have enough time then

Depending on what you need to do, both DOM and XSLT are useful
tools. For example, using DOM, you can query all of the elements
declarations like so:

   NodeList elemDecls = document.getElementsByTagName("elementDecl");
   int count = elemDecls.getLength();
   for (int i = 0; i < count; i++) {
     Element elemDecl = (Element)elemDecls.item(i);
     System.out.println("element: "+elemDecl.getAttribute("ename"));
   }

Good luck and Happy Holidays!

-- 
Andy Clark * andyc@apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org


Re: Parse DTD

Posted by Florian Roth <fl...@synatic.net>.
Am Freitag, 17. Dezember 2004 04:19 schrieb Andy Clark:
> Florian Roth wrote:
> > I only want to parse a DTD without validating an XML document against
> > it. I want to know what attributes a certain element can contain so I
> > digged a bit in the Xerces doc but found nothing that could satisfy
> > my needs. Any ideas?
>
> The NekoDTD project uses Xerces2 and allows you to parse DTD
> documents, returning their contents as an XML document. Then
> you can use standard XML tools to pull out the info that you
> want. Here's the link:
>
>    http://www.apache.org/~andyc/neko/

Thanks for your help.
That's exactly what I wanted, but the generated xml looks very 
confusing ... :)
So I have to get the needed information out of it now ... OK, it's just one 
week until vacations, so I should have enough time then

Greets
Florian

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org


Re: Parse DTD

Posted by Andy Clark <an...@cyberneko.net>.
Florian Roth wrote:
> I only want to parse a DTD without validating an XML document against
> it. I want to know what attributes a certain element can contain so I
> digged a bit in the Xerces doc but found nothing that could satisfy
> my needs. Any ideas?

The NekoDTD project uses Xerces2 and allows you to parse DTD
documents, returning their contents as an XML document. Then
you can use standard XML tools to pull out the info that you
want. Here's the link:

   http://www.apache.org/~andyc/neko/

-- 
Andy Clark * andyc@apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org