You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-dev@xerces.apache.org by Jessica Perry Hekman <jp...@arborius.net> on 2000/10/06 00:07:07 UTC

More onl the fly validation troubles

I'm continuing to struggle with using Xerces-J (1.2.0) for on-the-fly
validation. Last week, I wrote the list to report that I wanted to use
XMLValidator.whatCanGoHere, but couldn't because it was protected and
XMLValidator was final. I have since modified the source to make the
method public and am now trying to use it. I'm failing.

XMLValidator.whatCanGoHere needs to get the content model of the element
in question. To do so, it calls XMLValidator.getElementContentModel, which
in turn calls Grammar.getElementDecl, to check if it has yet seen that
element declared. Which, in this case, it has; it's parsed and tried to
validate the entire document, so everything in the DTD should have been
seen -- right?.

In Grammar.getElementDecl, we see the following code:


    public boolean getElementDecl(int elementDeclIndex, XMLElementDecl elementDecl) {
        if (elementDeclIndex < 0 || elementDeclIndex >= fElementDeclCount) {
            return false;
        }

It's returning false, because elementDeclIndex (the index of this element)
is 42, and elementDecl (the number of elements which it has seen defined)
is 21. Bummer. (In this particular test case, I used Bosak's play.dtd and
a modified hamlet.xml which I made invalid but which is still
well-formed.)

I wondered if my problem was that I was getting the elementDeclIndex
wrong. My code gets it like so:

    public int getElementIndex (String elementName) {
        return(this.pool.addSymbol(elementName));
    }

this.pool is a pointer to the parser's StringPool field. I checked; it
does indeed get modified when the parser does its thing. addSymbol(),
while undocumented, does appear to return the indices of
previously-existing elements when presented with them, rather than always
adding them as new symbols.

Okay. So what am I doing wrong? Why does the validator think that this
element hasn't been seen yet? (I would be happy to provide full source
and a sample document to anyone who's interested.)

I'd love some help; I've been banging my head against this for a week
now, with no real results. I've looked at moving to other parsers, but
none of them seem to provide access to content models the way Xerces
does. I'm really frustrated -- and willing to help work out
requirements for future APIs (more than willing, actually). Please
help!

Thanks,
Jessica


Re: More onl the fly validation troubles

Posted by Jessica Perry Hekman <jp...@arborius.net>.
On Fri, 6 Oct 2000, Andy Clark wrote:

> Yes, please send us all of your code and thoughts on the topic! :)

It's now working. I made four changes, below. (I didn't want to deal with
diffs since I've mucked with the source enough in debugging that I'd have
to spend more time editing out the cruft than is worthwhile.)

* [API change] In XMLValidator, I added:

    public Grammar getGrammar() {
	return(this.fGrammar);
    }

* [code change] In DFAContentModel#whatCanGoHere(), I added the final four
lines of this code snippet:

            // Get the current element index out
            final QName curElem = info.curChildren[childIndex];

            // not valid so return failure index
	    if (curElem == null) {
		return childIndex;
	    }

* [API change] I made DFAContentModel public:

    //protected int whatCanGoHere(int elementIndex, boolean fullyValid,
    public int whatCanGoHere(int elementIndex, boolean fullyValid,
                                InsertableElementsInfo info) throws Exception {


* [code change] In DFAContentModel#validateContent(), I added the final
three lines of this code snippet:
        //
        //  Lets loop through the children in the array and move our way
        //  through the states. Note that we use the fElemMap array to map
        //  an element index to a state index.
        //
        int curState = 0;
        for (int childIndex = 0; childIndex < length; childIndex++)
        {
            // Get the current element index out
            final QName curElem = children[offset + childIndex];
            // obviously not valid
	    if (curElem == null)
		return childIndex;

RevalidatingParser doesn't compile, which doesn't matter for the project
I'm working on, but I thought I'd mention it.

I had to subclass DOMParser in order to get hold of its StringPool and
Validator pointers.

> Since you are deeply involved with using this kind of information,
> I think that you are an ideal person to help fill in that area of
> the design. It was always the intention to provide this kind of

I would be happy to join in the discussions, but don't assume that I'm
deeply involved with this stuff; I just had this one contract which
happened to require making an invalid DOM tree valid, given certain
parameters. I did enjoy mucking around in this code, for what that's
worth. Also, your point about waiting for DOM3 to be done is definitely a
valid one.

Anyways, if given all that you still think it's worth my helping out, let
me know; I'd be happy to, as I said.

j


Re: More onl the fly validation troubles

Posted by Andy Clark <an...@apache.org>.
Jessica Perry Hekman wrote:
> If I do get this stuff working, is there any point in letting 
> people know about the modifications I'm making locally so that 
> they can be added to the code base where appropriate? I know 
> that people are moving over to Xerces2, and I don't know the 
> level of abandonment of Xerces1. My modifications are both at 
> the API level (adding methods to give access to private fields, 
> for example) and at the code level (checking for nulls).

Yes, please send us all of your code and thoughts on the topic! :)

When writing up the Xerces2 concept design, I intentionally left 
out the area of grammar access and query. There is currently work 
in DOM Level 3 to provide access to content models and validation 
and I didn't want to design something that would be incompatible 
with that system (or others). Plus, I wasn't that impressed with
the way in which we provided this access in Xerces.

Since you are deeply involved with using this kind of information,
I think that you are an ideal person to help fill in that area of
the design. It was always the intention to provide this kind of
functionality in the new design but I just don't have the right
experience to know what's the best thing to do.

If you would like to help out in this area, please let me know.

-- 
Andy Clark * IBM, JTC - Silicon Valley * andyc@apache.org

Re: More onl the fly validation troubles

Posted by Jessica Perry Hekman <jp...@arborius.net>.
On Thu, 5 Oct 2000, Andy Clark wrote:

> If you have the element name symbol and want to get the element 
> decl, you have to get the element decl index for that name. Notice 
> the Grammar#getElementDeclIndex(int,int) method. If you are only 

This did, indeed, get me past the spot where I was having problems. Thank
you so much, Andy; I appreciate your help.

I am now getting DNPs in DFAContentModel, but as I've successfully tracked
down and fixed the first one of those, I hope to be able to work my way
through the second one as well (and presumably those which will,
inevitably, follow :)

If I do get this stuff working, is there any point in letting people know
about the modifications I'm making locally so that they can be added to
the code base where appropriate? I know that people are moving over to
Xerces2, and I don't know the level of abandonment of Xerces1. My
modifications are both at the API level (adding methods to give access to
private fields, for example) and at the code level (checking for nulls).

Thanks again,
Jessica



Re: More onl the fly validation troubles

Posted by Jessica Perry Hekman <jp...@arborius.net>.
> Does this help any? or were you already doing this? If so, then
> perhaps Eric's assessment of the state of whatCanGoHere is
> correct.

This is extremely helpful. I'll play around with it and report my findings
back to the list. Thanks!

Jessica


Re: More onl the fly validation troubles

Posted by Andy Clark <an...@apache.org>.
Eric may be right regarding the current state of the whatCanGoHere
method. However, I think that I see some problems with the way that
you are querying the element declaration. Perhaps I'm just reading
your post wrong or missing some information but you can read on and
decide for yourself.

Names of elements and attributes are indeed assigned unique ids
from the string pool. Therefore, anytime that you are using the
element name, you have to pass in that unique id. Next, the id
for the element name symbol is NOT used as the element decl index. 
Each time an element type is declared it receives its own unique 
element decl index within the Grammar object. 

If you have the element name symbol and want to get the element 
decl, you have to get the element decl index for that name. Notice 
the Grammar#getElementDeclIndex(int,int) method. If you are only 
using DTDs, then this is the method that you want. You pass in the 
element name symbol for the "localpart" parameter and a value of 
-1 for the "scope" parameter. All top level element declaration 
(all element decls are top level in DTDs) have a scope of -1. 

Does this help any? or were you already doing this? If so, then
perhaps Eric's assessment of the state of whatCanGoHere is
correct.

-- 
Andy Clark * IBM, JTC - Silicon Valley * andyc@apache.org

Re: More onl the fly validation troubles

Posted by Jessica Perry Hekman <jp...@arborius.net>.
On Thu, 5 Oct 2000, Eric Ye wrote:

> Didn't see your previous posting, it looks like you are trying to work with
> WhatCanGoHere().  Unfortunately this method is way out of sync with the
> validation structure, same applies to those methods with the same name in
> the content model classes. I am pretty sure they are all broken, sorry if
> this had cost you a lot of time.  Since nobody is currently working on it,
> you probably can take a stab to fix these methods, it wouldn't be a easy
> work though.

That would explain why whatCanGoHere had its permissions set so that it
was completely inaccessible :) I did at one point try Xerces 1.0.3, in
which the method was public; does anyone know if it was expected to work
then? (I had the same problem with that version that I am having with
1.2.0.)

I'll look into Andy's suggestion, but it's good to know that I'm not
having the problem just because I'm missing something.

Thanks,
Jessica


Re: More onl the fly validation troubles

Posted by Eric Ye <er...@locus.apache.org>.
Didn't see your previous posting, it looks like you are trying to work with
WhatCanGoHere().  Unfortunately this method is way out of sync with the
validation structure, same applies to those methods with the same name in
the content model classes. I am pretty sure they are all broken, sorry if
this had cost you a lot of time.  Since nobody is currently working on it,
you probably can take a stab to fix these methods, it wouldn't be a easy
work though.
_____


Eric Ye * IBM, JTC - Silicon Valley * ericye@locus.apache.org

----- Original Message -----
From: "Jessica Perry Hekman" <jp...@arborius.net>
To: <xe...@xml.apache.org>
Sent: Thursday, October 05, 2000 3:07 PM
Subject: More onl the fly validation troubles


> I'm continuing to struggle with using Xerces-J (1.2.0) for on-the-fly
> validation. Last week, I wrote the list to report that I wanted to use
> XMLValidator.whatCanGoHere, but couldn't because it was protected and
> XMLValidator was final. I have since modified the source to make the
> method public and am now trying to use it. I'm failing.
>
> XMLValidator.whatCanGoHere needs to get the content model of the element
> in question. To do so, it calls XMLValidator.getElementContentModel, which
> in turn calls Grammar.getElementDecl, to check if it has yet seen that
> element declared. Which, in this case, it has; it's parsed and tried to
> validate the entire document, so everything in the DTD should have been
> seen -- right?.
>
> In Grammar.getElementDecl, we see the following code:
>
>
>     public boolean getElementDecl(int elementDeclIndex, XMLElementDecl
elementDecl) {
>         if (elementDeclIndex < 0 || elementDeclIndex >= fElementDeclCount)
{
>             return false;
>         }
>
> It's returning false, because elementDeclIndex (the index of this element)
> is 42, and elementDecl (the number of elements which it has seen defined)
> is 21. Bummer. (In this particular test case, I used Bosak's play.dtd and
> a modified hamlet.xml which I made invalid but which is still
> well-formed.)
>
> I wondered if my problem was that I was getting the elementDeclIndex
> wrong. My code gets it like so:
>
>     public int getElementIndex (String elementName) {
>         return(this.pool.addSymbol(elementName));
>     }
>
> this.pool is a pointer to the parser's StringPool field. I checked; it
> does indeed get modified when the parser does its thing. addSymbol(),
> while undocumented, does appear to return the indices of
> previously-existing elements when presented with them, rather than always
> adding them as new symbols.
>
> Okay. So what am I doing wrong? Why does the validator think that this
> element hasn't been seen yet? (I would be happy to provide full source
> and a sample document to anyone who's interested.)
>
> I'd love some help; I've been banging my head against this for a week
> now, with no real results. I've looked at moving to other parsers, but
> none of them seem to provide access to content models the way Xerces
> does. I'm really frustrated -- and willing to help work out
> requirements for future APIs (more than willing, actually). Please
> help!
>
> Thanks,
> Jessica
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-j-dev-help@xml.apache.org
>
>