You are viewing a plain text version of this content. The canonical link for it is here.
Posted to c-dev@xerces.apache.org by Marc Seldin <ms...@bbn.com> on 2005/09/01 18:52:56 UTC

position offsets in xerces

I'm struggling with a frustrating problem. Ultimately I'm working on an NLP
application where I need to locate the file offset positions where elements
occur. I can't find a way to get this in DOM mode, so I turned to SAX
parsing. My thought was to create a DOM object via a SAX handler and use the
setuserdata method to store the node offsets as these are discovered.

 

However, only one kind of object, a DOMLocator, seems to provide this
information. In contrast, the only type of object that seems available in
SAX parsing (as far as I've discovered) is a straight Locator object, which
is unrelated to the DOMLocator in any way.

 

My questions are thus:

1)       Is there a better way to get these offset then going through SAX
parsing?

2)       Even using SAX parsing, how do I involve the DOMLocator class? Or
am I on the wrong track completely there?

 

Thanks in advance.

 

 

 

_______________________________________________

Marc Seldin

Scientist

BBN Technologies

410-290-6141

 


Re: position offsets in xerces

Posted by Alberto Massari <am...@datadirect.com>.
Hi Marc,
I would create a class derived from the XercesDOMParser and override 
the startElement API; there I would call the base class' 
implementation, then retrieve getScanner()->getSrcOffset() and store 
it in the newly created node (fCurrentNode, or, if the isEmpty 
parameter is "true", fCurrentParent).
You can also do the same using the SAX parser.

Alberto

At 12.52 01/09/2005 -0400, Marc Seldin wrote:
>I'm struggling with a frustrating problem. Ultimately I'm working on 
>an NLP application where I need to locate the file offset positions 
>where elements occur. I can't find a way to get this in DOM mode, so 
>I turned to SAX parsing. My thought was to create a DOM object via a 
>SAX handler and use the setuserdata method to store the node offsets 
>as these are discovered.
>
>However, only one kind of object, a DOMLocator, seems to provide 
>this information. In contrast, the only type of object that seems 
>available in SAX parsing (as far as I've discovered) is a straight 
>Locator object, which is unrelated to the DOMLocator in any way.
>
>My questions are thus:
>1)       Is there a better way to get these offset then going 
>through SAX parsing?
>2)       Even using SAX parsing, how do I involve the DOMLocator 
>class? Or am I on the wrong track completely there?
>
>Thanks in advance.
>
>
>
>_______________________________________________
>Marc Seldin
>Scientist
>BBN Technologies
>410-290-6141
>



---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org