You are viewing a plain text version of this content. The canonical link for it is here.
Posted to c-users@xerces.apache.org by Michael Weitzel <mi...@uni-siegen.de> on 2006/08/11 16:43:34 UTC

line and column of an element/DOMNode?

Hi all,

is there a way to determine the line and column of the XML file
associated with a specific DOMNode / element? The data in my XML format
requires a few complex semantic validations that cannot be expressed by
the DTD. The simpler errors can be detected based on the context while
traversing the DOM tree and it would be nice to give more specific error
messages.

Am I right that "DOMLocator" cannot be used since no "DOMError" occurs?

Thanks! :)
-- 
Michael Weitzel



Re: line and column of an element/DOMNode?

Posted by 4p...@sneakemail.com.
Thanks for all the flowers :) I was just at that time working on a 
project that uses the parser heavily, and it's abstracted enough so I 
could simply copy and paste it into an email - Xerces is free, and also 
lives from people donating code, so I thought I give something back.

I was suggesting the hash table solution because of the many allocations 
the current implementation makes (it uses std::string for a start, and 
allocates a Tag for each DOMElement). You could create a hash table that 
only allocates one large block upfront and then fills it up until it's 
full, and after that goes on and resizes it if more Tags come along. Oh, 
and some std::string implementations might actually use copy-on-write; 
in this case you don't really have a memory footprint problem with the 
systemID.

Storing a pointer to the parent's systemID is in principal not a bad 
solution, however, it makes the implementation more complicated, since 
you could import nodes from other DOMDocuments, so you'd end up with 
Tags in document A pointing to Tags in document B, which could create 
problems when you delete B (for instance by deleting it's parent parser).

But feel free to change it, that's why I posted it here ;) I originally 
only wanted something that works, without paying too much attention to 
it's memory footprint or performance (Other parts of my project demand 
much more attention in these and other aspects *g*)

Glad I could help,

Cheers,

Uwe


Michael Weitzel michael.weitzel-at-uni-siegen.de |xerces-c-users mailing 
list| schrieb:
> Hi Uwe,
>
> I am really impressed by the speed your response and the universality of
> your solution. Your TaggingDOMParser works just fine. Many thanks :-)
>
> I agree that it's a common concern to locate the source of an error for
> any context sensitive application (IDREFs are useful but this method is
> too weak because of its global scoping). Maybe this is a problem of the
> DOM standard ...
>
> I think your solution is just fine. Wouldn't a hash table for the tags
> create additional overhead? I will remove the SystemID from the Tags to save
> memory. It is redundant to store it in every Tag. Maybe it should be replaced
> by a pointer to the parent's SystemID -- similar to the static scoping found
> in programming languages with nested blocks where a variable is searched in
> the surrounding ("parent") blocks when is can't be located in the current
> block...).
>
> Thanks again :-)
>
> Am Freitag, den 11. August 2006, um 18:07h schrieb 4pzbrog02@sneakemail.com:
>
>   
>> First of all, this is one of these recurring questions that are asked 
>> all over again and again. Maybe the answer is put in a place so 
>> prominent (FAQ?) that it doesn't occur any more (I rember that I was 
>> asking the same question as well a couple of months ago)
>>
>> You will have to add tagging objects to each DOMElement while you let 
>> the DOMParser parse the XML file. So basically, what you do is is to 
>> derive a class from the DOMParser and override it's startElement 
>> function (yep, it is actually a SAXParser as well). Another option might 
>> be to maintain a hashtable with pointers to DOMNodes as keys, but I 
>> didn't do that (hmm, might not be too bad of an idea, maybe next time ;)
>>
>> The Tags need to be refernce counted if you want to clone nodes (hmm, I 
>> wonder if I do that in my project, actually), otherwise you'll end up 
>> with nasty lifetime issues for your Tag objects. For this reason, Tags 
>> are implemented with their own specific DataHandler, which takes care of 
>> these lifetime issues.
>>
>> However, I hope that the code I'm submitting here will answer it once 
>> and for all (although I don't guarantee it to be perfect nor performant, 
>> it just works for what I do, if it doesn't work for you it's your 
>> problem - and <disclaimer>I DONT TAKE RESPONSIBILITY FOR ANY PROBLEMS 
>> CAUSED BY IT</disclaimer> ;)
>>
>> Find snippets of the code I am using to do this.
>>
>> I hope this code is somewhat useful to you Michael, and also everyone 
>> else who stumbles over this problem. It that seems all too simple so 
>> that one would assume Xerces can do it out of the box, but unfortunately 
>> it can't.
>>
>> Cheers,
>>
>> Uwe
>>
>> ------------ from TaggingDOMParser.hpp:
>>     
> [...]
>   
>> -------------------
>>
>> Oh, and StrX() is actually just a helper class that takes and XMLCh* in 
>> it's contructor and stores the UTF8-transcoded version, which can be 
>> obtained by it's getString() method; you might find a similar 
>> implementation in Xerces' examples
>>
>> Hope this helps, feedback welcome.
>>
>> Cheers,
>>
>> Uwe
>>     
>
>   
>>> is there a way to determine the line and column of the XML file
>>> associated with a specific DOMNode / element? The data in my XML format
>>> requires a few complex semantic validations that cannot be expressed by
>>> the DTD. The simpler errors can be detected based on the context while
>>> traversing the DOM tree and it would be nice to give more specific error
>>> messages.
>>>
>>> Am I right that "DOMLocator" cannot be used since no "DOMError" occurs?
>>>       


Re: line and column of an element/DOMNode?

Posted by Michael Weitzel <mi...@uni-siegen.de>.
Hi Uwe,

I am really impressed by the speed your response and the universality of
your solution. Your TaggingDOMParser works just fine. Many thanks :-)

I agree that it's a common concern to locate the source of an error for
any context sensitive application (IDREFs are useful but this method is
too weak because of its global scoping). Maybe this is a problem of the
DOM standard ...

I think your solution is just fine. Wouldn't a hash table for the tags
create additional overhead? I will remove the SystemID from the Tags to save
memory. It is redundant to store it in every Tag. Maybe it should be replaced
by a pointer to the parent's SystemID -- similar to the static scoping found
in programming languages with nested blocks where a variable is searched in
the surrounding ("parent") blocks when is can't be located in the current
block...).

Thanks again :-)

Am Freitag, den 11. August 2006, um 18:07h schrieb 4pzbrog02@sneakemail.com:

> First of all, this is one of these recurring questions that are asked 
> all over again and again. Maybe the answer is put in a place so 
> prominent (FAQ?) that it doesn't occur any more (I rember that I was 
> asking the same question as well a couple of months ago)
> 
> You will have to add tagging objects to each DOMElement while you let 
> the DOMParser parse the XML file. So basically, what you do is is to 
> derive a class from the DOMParser and override it's startElement 
> function (yep, it is actually a SAXParser as well). Another option might 
> be to maintain a hashtable with pointers to DOMNodes as keys, but I 
> didn't do that (hmm, might not be too bad of an idea, maybe next time ;)
> 
> The Tags need to be refernce counted if you want to clone nodes (hmm, I 
> wonder if I do that in my project, actually), otherwise you'll end up 
> with nasty lifetime issues for your Tag objects. For this reason, Tags 
> are implemented with their own specific DataHandler, which takes care of 
> these lifetime issues.
> 
> However, I hope that the code I'm submitting here will answer it once 
> and for all (although I don't guarantee it to be perfect nor performant, 
> it just works for what I do, if it doesn't work for you it's your 
> problem - and <disclaimer>I DONT TAKE RESPONSIBILITY FOR ANY PROBLEMS 
> CAUSED BY IT</disclaimer> ;)
> 
> Find snippets of the code I am using to do this.
> 
> I hope this code is somewhat useful to you Michael, and also everyone 
> else who stumbles over this problem. It that seems all too simple so 
> that one would assume Xerces can do it out of the box, but unfortunately 
> it can't.
> 
> Cheers,
> 
> Uwe
> 
> ------------ from TaggingDOMParser.hpp:
[...]
> -------------------
> 
> Oh, and StrX() is actually just a helper class that takes and XMLCh* in 
> it's contructor and stores the UTF8-transcoded version, which can be 
> obtained by it's getString() method; you might find a similar 
> implementation in Xerces' examples
> 
> Hope this helps, feedback welcome.
> 
> Cheers,
> 
> Uwe

> >is there a way to determine the line and column of the XML file
> >associated with a specific DOMNode / element? The data in my XML format
> >requires a few complex semantic validations that cannot be expressed by
> >the DTD. The simpler errors can be detected based on the context while
> >traversing the DOM tree and it would be nice to give more specific error
> >messages.
> >
> >Am I right that "DOMLocator" cannot be used since no "DOMError" occurs?
-- 
Michael Weitzel


Re: line and column of an element/DOMNode?

Posted by 4p...@sneakemail.com.
First of all, this is one of these recurring questions that are asked 
all over again and again. Maybe the answer is put in a place so 
prominent (FAQ?) that it doesn't occur any more (I rember that I was 
asking the same question as well a couple of months ago)

You will have to add tagging objects to each DOMElement while you let 
the DOMParser parse the XML file. So basically, what you do is is to 
derive a class from the DOMParser and override it's startElement 
function (yep, it is actually a SAXParser as well). Another option might 
be to maintain a hashtable with pointers to DOMNodes as keys, but I 
didn't do that (hmm, might not be too bad of an idea, maybe next time ;)

The Tags need to be refernce counted if you want to clone nodes (hmm, I 
wonder if I do that in my project, actually), otherwise you'll end up 
with nasty lifetime issues for your Tag objects. For this reason, Tags 
are implemented with their own specific DataHandler, which takes care of 
these lifetime issues.

However, I hope that the code I'm submitting here will answer it once 
and for all (although I don't guarantee it to be perfect nor performant, 
it just works for what I do, if it doesn't work for you it's your 
problem - and <disclaimer>I DONT TAKE RESPONSIBILITY FOR ANY PROBLEMS 
CAUSED BY IT</disclaimer> ;)

Find snippets of the code I am using to do this.

I hope this code is somewhat useful to you Michael, and also everyone 
else who stumbles over this problem. It that seems all too simple so 
that one would assume Xerces can do it out of the box, but unfortunately 
it can't.

Cheers,

Uwe

------------ from TaggingDOMParser.hpp:

#include <xercesc/parsers/XercesDOMParser.hpp>
#include <xercesc/dom/DOMUserDataHandler.hpp>

#include <string>

#include "assert.h"

class TaggingDOMParser : public XERCES_CPP_NAMESPACE::XercesDOMParser {
    class TagDataHandler;
    friend class TagDataHandler;
   
    public:
        struct Tag {
            public:
                inline Tag()
                :    lineNumber(-1),
                    columnNumber(-1),
                    referenceCount(0){
                }
               
                inline void link(){
                    ++referenceCount;
                }
               
                inline void unlink(){
                    assert(referenceCount>0);
                    --referenceCount;
                    if(referenceCount <= 0)
                        delete this;
                }
               
            public:
                std::string systemID;
                int lineNumber;
                int columnNumber;
               
            private:
                int referenceCount;
           
            protected:
                inline ~Tag(){
                }
           
        };
   
    private:
        TagDataHandler* dataHandler;
       
    protected:
        Tag* createTag();
       
    public:
        TaggingDOMParser();
        virtual ~TaggingDOMParser();
       
        virtual void startElement (const 
XERCES_CPP_NAMESPACE::XMLElementDecl &elemDecl, const unsigned int 
uriId, const XMLCh *const prefixName, const 
XERCES_CPP_NAMESPACE::RefVectorOf< XERCES_CPP_NAMESPACE::XMLAttr > 
&attrList, const unsigned int attrCount, const bool isEmpty, const bool 
isRoot);
       
        static const TaggingDOMParser::Tag* getTag(const 
XERCES_CPP_NAMESPACE::DOMNode* node);
};


------------ TaggingDOMParser.cpp

#include "TaggingDOMParser.hpp"

#include <xercesc/internal/XMLScanner.hpp>

#include "StrX.hpp"

using namespace XERCES_CPP_NAMESPACE;


static XMLCh* tagKey = L"LineNumberAnnotation";

class TaggingDOMParser::TagDataHandler : public 
XERCES_CPP_NAMESPACE::DOMUserDataHandler {
    private:
        TaggingDOMParser* parser;
       
    public:
   
        TagDataHandler()
        :    parser(0)
        {
        }
       
        virtual ~TagDataHandler(){
        }
       
        inline setParser(TaggingDOMParser* parser){
            this->parser = parser;
        }
       
        virtual void handle(DOMOperationType operation, const XMLCh 
*const key, void *data, const DOMNode *src, const DOMNode *dst){
            Tag* srcTag = static_cast<Tag*>(data);
            switch(operation){
                // import and clone are basically the same case, in 
both, the node
                // is cloned
                case NODE_IMPORTED:
                case NODE_CLONED:
                   srcTag->link();
                    break;
                case NODE_DELETED:
                    srcTag->unlink();
                    break;
                case NODE_RENAMED:
                    // do nothing on rename
                    break;
            }
        }
};


TaggingDOMParser::TaggingDOMParser()
:    dataHandler(new TagDataHandler()){
    dataHandler->setParser(this);
}


TaggingDOMParser::~TaggingDOMParser()
{
}


TaggingDOMParser::Tag* TaggingDOMParser::createTag(){
    return new Tag();
}


void TaggingDOMParser::startElement(
    const XMLElementDecl &elemDecl,
    const unsigned int uriId,
    const XMLCh *const prefixName,
    const RefVectorOf< XMLAttr > &attrList,
    const unsigned int attrCount,
    const bool isEmpty,
    const bool isRoot
    ) {
    // supercall
    XercesDOMParser::startElement(elemDecl, uriId, prefixName, attrList, 
attrCount, isEmpty, isRoot);
   
    if(!isEmpty){
        Tag* tag = createTag();
        const Locator* locator = getScanner()->getLocator();
        tag->systemID = StrX(locator->getSystemId()).getString();
        tag->lineNumber = locator->getLineNumber();
        tag->columnNumber = locator->getColumnNumber();
       
        XercesDOMParser::fCurrentNode->setUserData(tagKey, tag, 
dataHandler);
       
        tag->link();
    }
}


const TaggingDOMParser::Tag* TaggingDOMParser::getTag(const DOMNode* node){
    return static_cast<TaggingDOMParser::Tag*>(node->getUserData(tagKey));
}

-------------------

Oh, and StrX() is actually just a helper class that takes and XMLCh* in 
it's contructor and stores the UTF8-transcoded version, which can be 
obtained by it's getString() method; you might find a similar 
implementation in Xerces' examples

Hope this helps, feedback welcome.

Cheers,

Uwe


Michael Weitzel michael.weitzel-at-uni-siegen.de |xerces-c-users mailing 
list| schrieb:
> Hi all,
>
> is there a way to determine the line and column of the XML file
> associated with a specific DOMNode / element? The data in my XML format
> requires a few complex semantic validations that cannot be expressed by
> the DTD. The simpler errors can be detected based on the context while
> traversing the DOM tree and it would be nice to give more specific error
> messages.
>
> Am I right that "DOMLocator" cannot be used since no "DOMError" occurs?
>
> Thanks! :)
>