You are viewing a plain text version of this content. The canonical link for it is here.
Posted to c-users@xerces.apache.org by co...@gmail.com on 2006/04/21 02:47:26 UTC

Extracting DTD information

Hello,

I'm trying to user Xerces C++ 2.7 to read information from a DTD. The
code below reads in the following entry in a DTD: <!ELEMENT   a (x, y,
z)>

// Load the DTD
XercesDOMParser parser;
DTDGrammar* grammar =3D (DTDGrammar*) parser.loadGrammar(dtdPath,
Grammar::DTDGrammarType);

// Get a specific DTD element
DTDElementDecl* elementDecl =3D (DTDElementDecl*)
grammar->getElemDecl(0, nil, objName, 0); // objName is "a"

// Get information about the children of this element
ContentSpecNode* specNode =3D elementDecl->getContentSpec();
DFAContentModel contentModel(true, specNode);

// Question 1: Whey does this return nil
ContentLeafNameTypeVector* content =3D
contentModel.getContentLeafNameTypeVector();

// Question 2: Why does this line return (child count + 1), in this case 4
int leafCount =3D content->getLeafCount();

Question 1: The only way I can see that I can access the list of
children, their names and types (exactly one, zero or one, zero or
more, one or more, etc) is to use getContentLeafNameTypeVector. This
unfortunately returns nil. Looking at DFAContentModel::buildDFA line
471 is the reason nil is returned:

if ( (fLeafListType[outIndex] & 0x0f) !=3D ContentSpecNode::Leaf )

I'm assuming that ContentSpecNode::Leaf is for exactly once. So the
net effect of this line is that if every child is a leaf node, then
the ContentLeafNameTypeVector isn't populated. Why?

Question 2: If I patch the code to ignore that check, the next problem
is that the count returned is (child count + 1), so for the DTD
element listed above, 4 is returned. The comments in the code indicate
that the last node is the end of content (EOC) node that is needed for
the implementation of DFAContentModel to "get rid of any repetition
short cuts". Can I assume that I will alway get child count + 1?

Cheers!

Re: Extracting DTD information

Posted by co...@gmail.com.
After more careful review of the code I now realize that accessing the
children of an entry in the DTD can be accomplished by using
ContentSpecNode and that I don't need/shouldn't use DFAContentModel.

I was a looking for a list of objects similar to the list of
attributes (getAttDefList), but now realize that the data structure is
actually a tree.

formatNode in ContentSpecNode.cpp is a good example of how to navigate the tree.

Cheers!

On 4/20/06, coding.meister@gmail.com <co...@gmail.com> wrote:
> Hello,
>
> I'm trying to user Xerces C++ 2.7 to read information from a DTD. The
> code below reads in the following entry in a DTD: <!ELEMENT   a (x, y,
> z)>
>
> // Load the DTD
> XercesDOMParser parser;
> DTDGrammar* grammar =3D (DTDGrammar*) parser.loadGrammar(dtdPath,
> Grammar::DTDGrammarType);
>
> // Get a specific DTD element
> DTDElementDecl* elementDecl =3D (DTDElementDecl*)
> grammar->getElemDecl(0, nil, objName, 0); // objName is "a"
>
> // Get information about the children of this element
> ContentSpecNode* specNode =3D elementDecl->getContentSpec();
> DFAContentModel contentModel(true, specNode);
>
> // Question 1: Whey does this return nil
> ContentLeafNameTypeVector* content =3D
> contentModel.getContentLeafNameTypeVector();
>
> // Question 2: Why does this line return (child count + 1), in this case 4
> int leafCount =3D content->getLeafCount();
>
> Question 1: The only way I can see that I can access the list of
> children, their names and types (exactly one, zero or one, zero or
> more, one or more, etc) is to use getContentLeafNameTypeVector. This
> unfortunately returns nil. Looking at DFAContentModel::buildDFA line
> 471 is the reason nil is returned:
>
> if ( (fLeafListType[outIndex] & 0x0f) !=3D ContentSpecNode::Leaf )
>
> I'm assuming that ContentSpecNode::Leaf is for exactly once. So the
> net effect of this line is that if every child is a leaf node, then
> the ContentLeafNameTypeVector isn't populated. Why?
>
> Question 2: If I patch the code to ignore that check, the next problem
> is that the count returned is (child count + 1), so for the DTD
> element listed above, 4 is returned. The comments in the code indicate
> that the last node is the end of content (EOC) node that is needed for
> the implementation of DFAContentModel to "get rid of any repetition
> short cuts". Can I assume that I will alway get child count + 1?
>
> Cheers!
>