You are viewing a plain text version of this content. The canonical link for it is here.
Posted to c-users@xerces.apache.org by Byron Campen <bc...@estacado.net> on 2008/06/20 17:31:40 UTC

SAX2, source offsets, and OS X

	When trying to use the SAXParser::getSrcOffset() call, I'm getting a  
runtime exception stating "The current transcoding service does not  
support source offset information". I have set the  
fgXercesCalculateSrcOfs bool to true, and a little googling seems to  
indicate that source offsets are not supported in the  
MacOSUnicodeConverter.

	Is there some sort of workaround for this? If not, I need to find  
another way to solve the following problem:

	When encountering a specific element, I want to be able get the raw  
character buffer of everything inside that element (not just the text- 
value of the element, the raw buffer containing all descendants and  
such; stuff that the characters() callback doesn't give you). I / 
could/ assemble this buffer piecemeal from the callbacks I get from  
the parser, but that would be a really hideous solution.

Best regards,
Byron Campen


Re: SAX2, source offsets, and OS X

Posted by Byron Campen <bc...@estacado.net>.
On Jun 20, 2008, at 12:09 PM, David Bertoni wrote:

> Byron Campen wrote:
>>    When trying to use the SAXParser::getSrcOffset() call, I'm  
>> getting a runtime exception stating "The current transcoding  
>> service does not support source offset information". I have set the  
>> fgXercesCalculateSrcOfs bool to true, and a little googling seems  
>> to indicate that source offsets are not supported in the  
>> MacOSUnicodeConverter.
>>    Is there some sort of workaround for this? If not, I need to  
>> find another way to solve the following problem:
> You can build Xerces-C with the ICU transcoding service, which  
> supports offsets.  You'll need to build a copy of the ICU first, but  
> that shouldn't be difficult.  However, do not build Xerces-C using  
> the ICU as  a message loader, because newer versions of the ICU  
> don't work with Xerces-C.

	Right-o, thanks.

>>    When encountering a specific element, I want to be able get the  
>> raw character buffer of everything inside that element (not just  
>> the text-value of the element, the raw buffer containing all  
>> descendants and such; stuff that the characters() callback doesn't  
>> give you). I /could/ assemble this buffer piecemeal from the  
>> callbacks I get from the parser, but that would be a really hideous  
>> solution.
> I have to warn you that source offsets are not well tested.  There  
> are some known bugs in reporting offsets correctly in cases where  
> there are transcoding errors, but perhaps that won't affect your use  
> case.
>

	Everything I'm working on is strictly UTF-8, so I don't expect too  
many problems. I'll be sure to sanity-check the offsets I get back  
though, just in case.

Best regards,
Byron Campen


Re: SAX2, source offsets, and OS X

Posted by Byron Campen <bc...@estacado.net>.
	Yeah, I wasn't aware of the context-sensitive meaning of ICUROOT  
(building against an installed copy vs building against a source  
tree). I have sent a patch to the maintainer of the macport that adds  
a icu_transcoder variant. If anyone wants to play with it, here it is:


Re: SAX2, source offsets, and OS X

Posted by David Bertoni <db...@apache.org>.
Byron Campen wrote:
>> Byron Campen wrote:
...
> 
>     Is there some trick to building against an installed version of ICU? 
> Or does xercesc actually need the source tree?
It doesn't need the source tree.  Try reading the build instructions to 
see if that helps:

http://xerces.apache.org/xerces-c/build-misc.html#ICUPerl

In particular, you'll need to set the ICUROOT environment variable to 
point where you installed the ICU.

Dave

Re: SAX2, source offsets, and OS X

Posted by Byron Campen <bc...@estacado.net>.
> Byron Campen wrote:
>>    When trying to use the SAXParser::getSrcOffset() call, I'm  
>> getting a runtime exception stating "The current transcoding  
>> service does not support source offset information". I have set the  
>> fgXercesCalculateSrcOfs bool to true, and a little googling seems  
>> to indicate that source offsets are not supported in the  
>> MacOSUnicodeConverter.
>>    Is there some sort of workaround for this? If not, I need to  
>> find another way to solve the following problem:
> You can build Xerces-C with the ICU transcoding service, which  
> supports offsets.  You'll need to build a copy of the ICU first, but  
> that shouldn't be difficult.  However, do not build Xerces-C using  
> the ICU as  a message loader, because newer versions of the ICU  
> don't work with Xerces-C.

	Is there some trick to building against an installed version of ICU?  
Or does xercesc actually need the source tree?

Best regards,
Byron Campen


Re: SAX2, source offsets, and OS X

Posted by David Bertoni <db...@apache.org>.
Byron Campen wrote:
>     When trying to use the SAXParser::getSrcOffset() call, I'm getting a 
> runtime exception stating "The current transcoding service does not 
> support source offset information". I have set the 
> fgXercesCalculateSrcOfs bool to true, and a little googling seems to 
> indicate that source offsets are not supported in the 
> MacOSUnicodeConverter.
> 
>     Is there some sort of workaround for this? If not, I need to find 
> another way to solve the following problem:
You can build Xerces-C with the ICU transcoding service, which supports 
offsets.  You'll need to build a copy of the ICU first, but that 
shouldn't be difficult.  However, do not build Xerces-C using the ICU as 
  a message loader, because newer versions of the ICU don't work with 
Xerces-C.

> 
>     When encountering a specific element, I want to be able get the raw 
> character buffer of everything inside that element (not just the 
> text-value of the element, the raw buffer containing all descendants and 
> such; stuff that the characters() callback doesn't give you). I /could/ 
> assemble this buffer piecemeal from the callbacks I get from the parser, 
> but that would be a really hideous solution.
I have to warn you that source offsets are not well tested.  There are 
some known bugs in reporting offsets correctly in cases where there are 
transcoding errors, but perhaps that won't affect your use case.

Dave