You are viewing a plain text version of this content. The canonical link for it is here.
Posted to c-dev@xerces.apache.org by Sean Hoffman <se...@home.com> on 2000/10/14 15:15:51 UTC

My head is spinning here.. some basic xerces questions....

I'm trying to wrap my head around the class library here, so please bear with me.  Some pretty basic questions I have here that someone might be able to answer quickly. 

1.  QName:  A QName is the name of an element if it is not defined in a namespace, correct?

2.  "PI".  What do those two letters stand for (aside from the mathematical and the movie ;-) )?  I see it when I'm digging through XMLScanner (scanPI(), etc..)

3.  DFAContentModel.  What does the DFA stand for?  It sounds strange, but In trying to understand it understanding the abbreviations helps me to understand the class library.
 
4.  Within an XMLValidator (and specifically, a DTDValidator).  "Id" seems to be the fundamental means of representing a DTD element within a context for validation (Is it correct to say that this is a "Pool"?).  If that is indeed the case, does one use addOrFindNSId to go from an element name to an Id?


Foir context, I've got two potentially complex problems I need to solve that I'm trying to figure out how to solve:

A.  These guys add their own XML text to an already validated document.  I need to scan this raw text and convert it into something that..

B.  A DTDValidator (or other XMLValidator) can then verify.


With regards to A, I think I'm on my own on this one, because most of the goodies inside of XMLScanner (the class that does all this good stuff in xerces) are private.  This is hairy, but I think I can get something together.

With regards to B, the light hasn't turned on for me for how to go from an element in a parsed package, a DOM_Element in a DOM_Document, for example, to its element ID inside the DTDValidator, so I can find out that it is indeed valid to make the change they're trying to make.


Finally, I noticed something which was a bit of a surprise to me.  If I create my own DTDValidator and pass it into a DOMParser, DOMParser will still delete the validator on destruction (it assumes that it owns the object).  This may be the intended behavior, but it might be helpful to someone else that you're handing over control of the validator to the parser when you pass it in to the constructor.  


Re: My head is spinning here.. some basic xerces questions....

Posted by Dean Roddey <dr...@charmedquark.com>.
"1.  QName:  A QName is the name of an element if it is not defined in a
namespace, correct?"

Not really. QName just means the full lexical name as it is found in the XML
file, if it has a prefix, then it includes that.

2.  "PI".  What do those two letters stand for (aside from the mathematical
and the movie ;-) )?  I see it when I'm digging through XMLScanner
(scanPI(), etc..)

Processing Instruction. Its a means for the writer of an XML file to pass
through instructions to the program that is parsing it, or which is using
the parsed data.

"3.  DFAContentModel.  What does the DFA stand for?  It sounds strange, but
In trying to understand it understanding the abbreviations helps me to
understand the class library."

Deterministic Finite Automata. Its a way of processing regular expressions,
of which XML content models are an example. It basically reduces a set of
NFAs (Non-deterministic Finite Automata) into a single determinstic one.
This implies that certainly limitations be placed on the form of the NFAs,
and the XML content model mechanism is partly the way it is because of those
limitations.

"4.  Within an XMLValidator (and specifically, a DTDValidator).  "Id" seems
to be the fundamental means of representing a DTD element within a context
for validation (Is it correct to say that this is a "Pool"?).  If that is
indeed the case, does one use addOrFindNSId to go from an element name to an
Id?"

Validators have to hold all of the definitions of elements, attributes,
entities, notations, etc... from a DTD. These are stored in a data structure
called a Pool, which is really just a high level 'collection' of objects,
they are templatized of course so that they can be reused for all of the
above types of things.

All of these things have names, and that is a primary way via which they
will be located later. So the core data structure within a poll is a hash
table. This allows for fast lookup by name. So later, when we see a start
element for <foo>, we can look up the foo name in the element pool and find
out if it was defined as a legal element type, and get the DTD information
about it (which we need to know what its attributes are and such.)

However, within the parser, constantly having to hash names for look up by
name would have a huge overhead burden. In particular when we do content
model validation, that could involve millions of operations on a big file
with a complex DTD. So for every thing that's added to a pool, a unique
(within that pool, which means for that particular type of thing) id is
assigned to it as well, and that id is returned. So, within the parser, we
tend to store ids and not names. This drastically reduces memory and makes
access to the DTD definition very fast.

So a pool is really a hash table and an ancillary lookup table. The lookup
table is just a vector. The entry for id x is a pointer into the hash table
for the entry that has that id. Since the ids are assigned sequentially as
each item is added, this table is naturally sorted by id, so an id can be
used to directly access it. So access via id is really just a double
indirection through the id table.


"B.  A DTDValidator (or other XMLValidator) can then verify."

Just spit out the new file to file or into memory and parse it with
validation turned on.


"Finally, I noticed something which was a bit of a surprise to me.  If I
create my own DTDValidator and pass it into a DOMParser, DOMParser will
still delete the validator on destruction (it assumes that it owns the
object).  This may be the intended behavior, but it might be helpful to
someone else that you're handing over control of the validator to the parser
when you pass it in to the constructor.  "

If you look at the parameter name, I think it will be something like
valToAdopt. The 'ToAdopt" part means that it's going to adopt it and own it.
DOMParser isn't deleting it. THe scanner is doing that.

--------------------------
Dean Roddey
The CIDLib C++ Frameworks
Charmed Quark Software
droddey@charmedquark.com
http://www.charmedquark.com

"It takes two buttocks to make friction"
    - African Proverb



Re: My head is spinning here.. some basic xerces questions....

Posted by Sean Hoffman <se...@home.com>.
Duh on the PI part on my part.  It's only in the XML spec.  (embarassed look
inserted here).


----- Original Message -----
From: "Radovan Chytracek" <Ra...@cern.ch>
To: <xe...@xml.apache.org>
Sent: Saturday, October 14, 2000 10:05 AM
Subject: RE: My head is spinning here.. some basic xerces questions....


> > 2.  "PI".  What do those two letters stand for (aside from
> > the mathematical and the movie ;-) )?  I see it when I'm
> > digging through XMLScanner (scanPI(), etc..)
>
> Theese letters mean Processing Instruction.
>
>
> > 3.  DFAContentModel.  What does the DFA stand for?
> > It sounds strange, but In trying to understand it
> > understanding the abbreviations helps me to understand the
> > class library.
>
> DFA - Deterministic Finite Automata
>
> Hope this helps
>
>                  Rado
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-c-dev-help@xml.apache.org
>


RE: My head is spinning here.. some basic xerces questions....

Posted by Radovan Chytracek <Ra...@cern.ch>.
> 2.  "PI".  What do those two letters stand for (aside from
> the mathematical and the movie ;-) )?  I see it when I'm
> digging through XMLScanner (scanPI(), etc..) 

Theese letters mean Processing Instruction.
 
 
> 3.  DFAContentModel.  What does the DFA stand for?
> It sounds strange, but In trying to understand it
> understanding the abbreviations helps me to understand the
> class library.

DFA - Deterministic Finite Automata

Hope this helps

                 Rado