You are viewing a plain text version of this content. The canonical link for it is here.
Posted to c-dev@xerces.apache.org by Cyberthymia <cy...@yahoo.co.uk> on 2001/11/22 12:08:22 UTC

Storing character positions in DOM

Hi

I'm currently in the process of adding an XML parser to a project I'm working on, using the DOM parser in XercesC.
The problem I've got at the moment is that I need to be able to tie in the nodes on the DOM tree to the original XML source being processed, i.e.
store the start and end character positions for each node (including elements, attributes and text data).

(As a bit of background info, I did originally look at using the SAX parser, but I couldn't get this to work as it doesn't throw events for the attributes. Besides I need to end up with a DOM interface, and didn't want to write my own to sit on top when Xerces gives you a perfectly good one already)

I've had a play around with Xerces and have reached the conclusion that there is no way of doing this without modifying the parser code itself, and adding new methods to the DOM classes for getting / setting the positions.

Before I start though, I wanted to see if anyone knows of a better way of doing this, as any changes I make to the parser are going to have to be carefully recorded and reimplemented every time I pick up a newer version of Xerces.

Thanks for any advice / help,
Richard Jinks


RE: Storing character positions in DOM

Posted by Erik Rydgren <er...@mandarinen.se>.
The solution for you is virtual methods. If you change all methods in the
SAX parser to virtual then you can subclass the parser and override the
functions in the original code to do what you want and still have the
original functionality to back you up.
The benifit is that you will have to modify much less code on new releases
because the modified code is in your classes and not in the ones that
changes on a new release.

Why modify when you can tweak instead? :)

Just an example:

class YourSAXParser : protected SAXParser
{
protected:
  // Your new code
  int GetFilePos()
  {
    // Do your magic here, using a mix of new and old code
    SAXParser::Method(); 	// call old code
    Method(); 			// call new code
  }

  // Original SAX method overrides
  virtual void aMethod(int anArgument) {
    int nFilePos = GetFilePos();
    aNewMethod(anArgument, nFilePos);
  }

public:

  // Your new SAX API methods
  virtual void aNewMethod(int anArgument, int nFilePos) {}
};

Best regards
Erik Rydgren
Mandarinen systems AB
Sweden

-----Original Message-----
From: Richard Jinks [mailto:cyberthymia@yahoo.co.uk]
Sent: den 23 november 2001 10:42
To: xerces-c-dev@xml.apache.org
Subject: Re: Storing character positions in DOM


From: "Jason E. Stewart" <ja...@openinformatics.com>


> "Cyberthymia" <cy...@yahoo.co.uk> writes:
>
> > I'm currently in the process of adding an XML parser to a project
> > I'm working on, using the DOM parser in XercesC.  The problem I've
> > got at the moment is that I need to be able to tie in the nodes on
> > the DOM tree to the original XML source being processed, i.e.  store
> > the start and end character positions for each node (including
> > elements, attributes and text data).
>
> > (As a bit of background info, I did originally look at using the SAX
> > parser, but I couldn't get this to work as it doesn't throw events
> > for the attributes. Besides I need to end up with a DOM interface,
> > and didn't want to write my own to sit on top when Xerces gives you
> > a perfectly good one already)
>
> > I've had a play around with Xerces and have reached the conclusion
> > that there is no way of doing this without modifying the parser code
> > itself, and adding new methods to the DOM classes for getting /
> > setting the positions.
> >
> > Before I start though, I wanted to see if anyone knows of a better
> > way of doing this, as any changes I make to the parser are going to
> > have to be carefully recorded and reimplemented every time I pick up
> > a newer version of Xerces.
>
> I think you probably want to subclass DOMParser (or IDOMParser) and
> use your subclass instead of redefining DOMParser. That way you could
> add all the methods you wanted.
>
> I'd actually like to see this get implemented for the SAXParser. I
> don't need to know where the attributes are, just the elements, so SAX
> would do all of what I need.
>
> jas.
>

My OO theory isn't up to scratch (it's been 5 years since my OO course at
Uni, so I'm not fully up on all the wierd and wonderful things you can do
with classes), but I'm not sure using subclasses will help solve all of my
problems as I'd need to change existing methods.
As far as I can tell, the current position in the file isn't stored any
where. It uses a buffer to read in from the input source, so only stores the
current position in the buffer. Once I've modified the code that reads from
the buffer to keep track of the overall position, I can then subclass it to
add the methods to get access to the counter.
Similarly, I'd need to modify the method that parses the elements to add the
current position to the node for both the element and any attributes.
But then I could quite easily be missing some useful way of doing this.

As far as doing this with SAX, (this is probably the equivalence of herecy
on this newsgroup, but...) have you looked at expat? This does have methods
that let you find out where in the input stream you are.

Thanks for your help,
Richard



_________________________________________________________
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org


Re: Storing character positions in DOM

Posted by Richard Jinks <cy...@yahoo.co.uk>.
From: "Jason E. Stewart" <ja...@openinformatics.com>


> "Cyberthymia" <cy...@yahoo.co.uk> writes:
>
> > I'm currently in the process of adding an XML parser to a project
> > I'm working on, using the DOM parser in XercesC.  The problem I've
> > got at the moment is that I need to be able to tie in the nodes on
> > the DOM tree to the original XML source being processed, i.e.  store
> > the start and end character positions for each node (including
> > elements, attributes and text data).
>
> > (As a bit of background info, I did originally look at using the SAX
> > parser, but I couldn't get this to work as it doesn't throw events
> > for the attributes. Besides I need to end up with a DOM interface,
> > and didn't want to write my own to sit on top when Xerces gives you
> > a perfectly good one already)
>
> > I've had a play around with Xerces and have reached the conclusion
> > that there is no way of doing this without modifying the parser code
> > itself, and adding new methods to the DOM classes for getting /
> > setting the positions.
> >
> > Before I start though, I wanted to see if anyone knows of a better
> > way of doing this, as any changes I make to the parser are going to
> > have to be carefully recorded and reimplemented every time I pick up
> > a newer version of Xerces.
>
> I think you probably want to subclass DOMParser (or IDOMParser) and
> use your subclass instead of redefining DOMParser. That way you could
> add all the methods you wanted.
>
> I'd actually like to see this get implemented for the SAXParser. I
> don't need to know where the attributes are, just the elements, so SAX
> would do all of what I need.
>
> jas.
>

My OO theory isn't up to scratch (it's been 5 years since my OO course at
Uni, so I'm not fully up on all the wierd and wonderful things you can do
with classes), but I'm not sure using subclasses will help solve all of my
problems as I'd need to change existing methods.
As far as I can tell, the current position in the file isn't stored any
where. It uses a buffer to read in from the input source, so only stores the
current position in the buffer. Once I've modified the code that reads from
the buffer to keep track of the overall position, I can then subclass it to
add the methods to get access to the counter.
Similarly, I'd need to modify the method that parses the elements to add the
current position to the node for both the element and any attributes.
But then I could quite easily be missing some useful way of doing this.

As far as doing this with SAX, (this is probably the equivalence of herecy
on this newsgroup, but...) have you looked at expat? This does have methods
that let you find out where in the input stream you are.

Thanks for your help,
Richard



_________________________________________________________
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org


Re: Storing character positions in DOM

Posted by Juergen Hermann <jh...@web.de>.
On 22 Nov 2001 08:39:31 -0700, Jason E. Stewart wrote:

>I'd actually like to see this get implemented for the SAXParser. I
>don't need to know where the attributes are, just the elements, so SAX
>would do all of what I need. 

You DO realize that the SAX Locator interface probably does what you 
want?

Also, when you add any code for this, consider supporting the following 
SAX property:

# http://xml.org/sax/properties/xml-string
#
# data type: String
# description: The literal string of characters that was the source for
#              the current event.
# access: read-only



Ciao, Jürgen

--
Jürgen Hermann, Developer (jhe@webde-ag.de)
WEB.DE AG, http://webde-ag.de/



---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org


Re: Storing character positions in DOM

Posted by "Jason E. Stewart" <ja...@openinformatics.com>.
"Cyberthymia" <cy...@yahoo.co.uk> writes:

> I'm currently in the process of adding an XML parser to a project
> I'm working on, using the DOM parser in XercesC.  The problem I've
> got at the moment is that I need to be able to tie in the nodes on
> the DOM tree to the original XML source being processed, i.e.  store
> the start and end character positions for each node (including
> elements, attributes and text data).

> (As a bit of background info, I did originally look at using the SAX
> parser, but I couldn't get this to work as it doesn't throw events
> for the attributes. Besides I need to end up with a DOM interface,
> and didn't want to write my own to sit on top when Xerces gives you
> a perfectly good one already)

> I've had a play around with Xerces and have reached the conclusion
> that there is no way of doing this without modifying the parser code
> itself, and adding new methods to the DOM classes for getting /
> setting the positions.
> 
> Before I start though, I wanted to see if anyone knows of a better
> way of doing this, as any changes I make to the parser are going to
> have to be carefully recorded and reimplemented every time I pick up
> a newer version of Xerces.

I think you probably want to subclass DOMParser (or IDOMParser) and
use your subclass instead of redefining DOMParser. That way you could
add all the methods you wanted.

I'd actually like to see this get implemented for the SAXParser. I
don't need to know where the attributes are, just the elements, so SAX
would do all of what I need. 

jas.

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org