You are viewing a plain text version of this content. The canonical link for it is here.

Posted to c-dev@xerces.apache.org by Luis Díaz <ld...@dextratech.com> on 2000/12/05 00:21:40 UTC

Escape Sequences

Hi guys,
I've recently been assigned to correct a problem with Xerces. I am really
new with Xerces.

We're using a design to fit application, it is composed by deriving a
HandlerBase class and using inside this class a SaxParser.
The application works fine.
The way it works is the following:
first create a new SaxParser and then receive the input file name and start
processing it with the parser->parseFirst()
and the followings calls with parser->parseNext().

The whole document gets processed. BUT when the document includes escape
secuences such as &lt; B &gt; The program only returns the string previous
to the Ampersand.
For example.
If The incoming string is:
<TAG> This is a simple line &lt;B&gt; of text &lt;/B&gt ;</TAG>

As I was debugging I finally saw that the line was being separated in
several parts
First One : This is a simple line <B
Second One: >
Third One: of text </B
Fouth >

After the parsing it only returns
This is a simple line.

My question is What do I have to process the escape sequences within my
application?

I am responding to the characters() method and to the StartElement too.

Cheers
Luis Beltrán Díaz Ramírez          ldiaz@dextratech.com
Software Engineer
Dextra Technologies
Monterrey NL MEXICO
+[52](8)130-2000 x 2051

Re: Escape Sequences

Posted by Dean Roddey <dr...@charmedquark.com>.

Oh ok, I should have read this one before I answer the other one. *All*
parsers are allowed to do this, and its explicitly mentioned in the XML spec
that you must be prepared to deal with this. Because of the rules for
parsers related to reporting the start/end of entities, its far more
practical to stop at the start of an entity, report any chars so far, get
the text of the entity, report it, then start collecting chars again, and so
forth. So, in most cases, if you are getting chunks, its because there are
entity references in the text.

--------------------------
Dean Roddey
The CIDLib C++ Frameworks
Charmed Quark Software
droddey@charmedquark.com
http://www.charmedquark.com

"It takes two buttocks to make friction"
    - African Proverb


----- Original Message -----
From: "Mike Herring" <mi...@worldnet.att.net>
To: <xe...@xml.apache.org>
Sent: Monday, December 04, 2000 9:13 PM
Subject: Re: Escape Sequences


> I ran into this and resolved it by concatenating the data delivered to the
> handler characters() callback if there were no intervening tags.  The data
> all gets delivered by the parser  - just in pieces.
>
> Can't say I'm pleased with the solution but it works for now.  I couldn't
> see that any other handler callbacks were called as a result of the escape
> codes.
>
> ----- Original Message -----
> From: Luis Díaz
> To: xerces-c-dev@xml.apache.org
> Sent: Monday, December 04, 2000 3:21 PM
> Subject: Escape Sequences
>
>
> Hi guys,
> I've recently been assigned to correct a problem with Xerces. I am really
> new with Xerces.
>
> We're using a design to fit application, it is composed by deriving a
> HandlerBase class and using inside this class a SaxParser.
> The application works fine.
> The way it works is the following:
> first create a new SaxParser and then receive the input file name and
start
> processing it with the parser->parseFirst()
> and the followings calls with parser->parseNext().
>
> The whole document gets processed. BUT when the document includes escape
> secuences such as &lt; B &gt; The program only returns the string previous
> to the Ampersand.
> For example.
> If The incoming string is:
> <TAG> This is a simple line &lt;B&gt; of text &lt;/B&gt ;</TAG>
>
> As I was debugging I finally saw that the line was being separated in
> several parts
> First One : This is a simple line <B
> Second One: >
> Third One: of text </B
> Fouth >
>
> After the parsing it only returns
> This is a simple line.
>
> My question is What do I have to process the escape sequences within my
> application?
>
> I am responding to the characters() method and to the StartElement too.
>
> Cheers
> Luis Beltrán Díaz Ramírez          ldiaz@dextratech.com
> Software Engineer
> Dextra Technologies
> Monterrey NL MEXICO
> +[52](8)130-2000 x 2051
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-c-dev-help@xml.apache.org
>

Re: Escape Sequences

Posted by Mike Herring <mi...@worldnet.att.net>.

I ran into this and resolved it by concatenating the data delivered to the
handler characters() callback if there were no intervening tags.  The data
all gets delivered by the parser  - just in pieces.

Can't say I'm pleased with the solution but it works for now.  I couldn't
see that any other handler callbacks were called as a result of the escape
codes.

----- Original Message -----
From: Luis Díaz
To: xerces-c-dev@xml.apache.org
Sent: Monday, December 04, 2000 3:21 PM
Subject: Escape Sequences


Hi guys,
I've recently been assigned to correct a problem with Xerces. I am really
new with Xerces.

We're using a design to fit application, it is composed by deriving a
HandlerBase class and using inside this class a SaxParser.
The application works fine.
The way it works is the following:
first create a new SaxParser and then receive the input file name and start
processing it with the parser->parseFirst()
and the followings calls with parser->parseNext().

The whole document gets processed. BUT when the document includes escape
secuences such as &lt; B &gt; The program only returns the string previous
to the Ampersand.
For example.
If The incoming string is:
<TAG> This is a simple line &lt;B&gt; of text &lt;/B&gt ;</TAG>

As I was debugging I finally saw that the line was being separated in
several parts
First One : This is a simple line <B
Second One: >
Third One: of text </B
Fouth >

After the parsing it only returns
This is a simple line.

My question is What do I have to process the escape sequences within my
application?

I am responding to the characters() method and to the StartElement too.

Cheers
Luis Beltrán Díaz Ramírez          ldiaz@dextratech.com
Software Engineer
Dextra Technologies
Monterrey NL MEXICO
+[52](8)130-2000 x 2051

Re: changing typedefs of XMLCh gives errors.

Posted by Dean Roddey <dr...@charmedquark.com>.

If the compiler is compliant, wchar_t and unsigned short are not equivalent,
though they might happen to be the same size. wchar_t is supposed to be a
unique type of its own, and can be whatever size the implementation chooses,
though it would be kind of silly for it to be less than 2 bytes in sizes,
since it wouldn't be very wide or very useful otherwise. On compilers that
aren't compliant, they will use a typedef, usually to unsigned short. But it
could be a 32 bit value as well.

--------------------------
Dean Roddey
The CIDLib C++ Frameworks
Charmed Quark Software
droddey@charmedquark.com
http://www.charmedquark.com

"It takes two buttocks to make friction"
    - African Proverb


----- Original Message -----
From: "Abhi" <at...@Adobe.COM>
To: <xe...@xml.apache.org>
Sent: Monday, December 04, 2000 6:40 PM
Subject: changing typedefs of XMLCh gives errors.


> Hi all,
>
> I am having some problems with the wchar_t typedef for XMLCh on Solaris
2.6
> using gcc 2.95 compiler. When I try to assert the XMLCh size (which is
> typedef wchar_t ) with another typedef which is *unsigned short*, I get
> assertion failure. So, when I change the typedef of XMLCh from wchar_t to
> unsigned short (which should be equivalent) in the file GCCDefs.hpp, I run
> into a problem while compiling. I get an *out of virtual memory* error.
And
> this error is coming from the equal() function in HashBase.cpp.
>
> Obviously, it seems, unsigned short and wchar_t do not have the same size
> on Solaris. But, I can't figure out the reason behind *out of virtual
> memory* error.
>
> Thanks,
> Abhi.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-c-dev-help@xml.apache.org
>

changing typedefs of XMLCh gives errors.

Posted by Abhi <at...@Adobe.COM>.

Hi all,

I am having some problems with the wchar_t typedef for XMLCh on Solaris 2.6 
using gcc 2.95 compiler. When I try to assert the XMLCh size (which is 
typedef wchar_t ) with another typedef which is *unsigned short*, I get 
assertion failure. So, when I change the typedef of XMLCh from wchar_t to 
unsigned short (which should be equivalent) in the file GCCDefs.hpp, I run 
into a problem while compiling. I get an *out of virtual memory* error. And 
this error is coming from the equal() function in HashBase.cpp.

Obviously, it seems, unsigned short and wchar_t do not have the same size 
on Solaris. But, I can't figure out the reason behind *out of virtual 
memory* error.

Thanks,
Abhi.