You are viewing a plain text version of this content. The canonical link for it is here.
Posted to c-dev@axis.apache.org by Susantha Kumara <su...@opensource.lk> on 2004/05/07 07:19:00 UTC

Encoding support - [WAS] RE: STL elimination - Suggestions

Hi Lilantha,


>-----Original Message-----
>From: Lilantha Darshana [mailto:ldarshana@edocs.com] 
>Sent: Friday, May 07, 2004 4:11 AM
>To: 'Apache AXIS C Developers List'
>Subject: RE: STL elimination - Suggestions
>
>I would suggestion at this point going for w_char * for strings
handling to
>support 
>Unicode char-sets.

What I have in mind is to make it a compile time decision whether Axis
uses utf-8 or utf-16. So Axis can be built in either ways. But this
doesn't mean that Axis receives the other encoding. It is the
responsibility of the XML parser to transcode the XML data to the
encoding that Axis has been built.

ie: if UTF8 build of Axis receives a message in UTF16, the XML parser
should transcode the data to UTF8. But all messges sent out will always
be in UTF8.

ie: if UTF16 build of Axis receives a message in UTF8, the XML parser
should transcode the data to UTF16. But all messges sent out will always
be in UTF16.

I think this is ok because WS-I profile says,

"while a sender might choose whether to encode XML in UTF-8 or UTF-16
when sending a message, a receiver must be capable of using either"

In order to do this we have to define a AxisChar so that compiler
directive will decide whether AxisChar is char(8-bit) or short(16-bit).
This means that we have to have all string manipulation functions like
strcmp, strcpy etc that can be used with either char or short strings. 
To solve this problem we might be able to use some existing opensource
tool kits rather than writing our own set of functions to replace
strcmp, strcpy, strcat etc.

---
Susantha.

>Think twice when you replace std::string with your impl. of a String
class.
>Do a performance 
>comparison and usability study. Your class should out perform
std::string in
>the context 
>you use. o/w stick with std::string.
>
>Writing your own Map class is also the same -- std::map impl. uses best
>algorithms available
>today. hence, If you go for your impl. you should think twice. -- if it
>possible to replace 
>key/value lookup with index/value lookup then just go for using an
array.
>o/w its an overkill. 
>and harder to use. void pointer casting require some more instruction
in
>it's machine code. 
>But compile time binding does not involve any additional instruction in
its
>object code. 
>
>But if you could make your code simple and avoid using std::maps,
that's
>really improves the 
>performance. Iff simple C code/algo can do the same task.
>
>But f we can avoid using following in most of the cases if you know the
size
>of the collection 
>before hand. by using a simple array.
>
>std::queue
>std::list
>std::stack
>
>regards 
>-Lilantha 
>
>
>-----Original Message-----
>From: Susantha Kumara [mailto:susantha@opensource.lk]
>Sent: Wednesday, May 05, 2004 11:30 PM
>To: axis-c-dev
>Subject: STL elimination - Suggestions
>
>
>Hi all,
> 
>I would suggest following order and steps to gradually eliminate STL in
Axis
>C++ codebase.
> 
>std::string
>Use "char*" instead of "string" wherever possible - need to take care
of the
>memory allocation and deallocation 
>Write our own String class to replace the strings left after step 1. 
>std::map
>Review the code base and minimize the use of std::map's functions.
Replace
>std::map with arrays or lists where possible 
>Write our own Map class with minimum functionality and use it in yet
>existing places after step 1. Even if we don't use templates to write
this
>map class we can make this almost generic if we use a class which has 
>void* for key 
>void* for value 
>allocator/deallocator provided externally 
>key comparison function provided externally 
> 
>std::queue
>std::list
>std::stack
>Probably we can use linked lists to replace these. 
> 
> 
>---
>Susantha.
>
>



Re: Encoding support - [WAS] RE: STL elimination - Suggestions

Posted by John Hawkins <HA...@uk.ibm.com>.



could icu4c help us with unicode issues here?



John Hawkins,



                                                                           
             "Susantha Kumara"                                             
             <susantha@opensou                                             
             rce.lk>                                                    To 
                                       "'Apache AXIS C Developers List'"   
             07/05/2004 06:19          <ax...@ws.apache.org>          
                                                                        cc 
                                                                           
             Please respond to                                     Subject 
              "Apache AXIS C           Encoding support - [WAS] RE: STL    
             Developers List"          elimination - Suggestions           
                                                                           
                                                                           
                                                                           
                                                                           
                                                                           
                                                                           




Hi Lilantha,


>-----Original Message-----
>From: Lilantha Darshana [mailto:ldarshana@edocs.com]
>Sent: Friday, May 07, 2004 4:11 AM
>To: 'Apache AXIS C Developers List'
>Subject: RE: STL elimination - Suggestions
>
>I would suggestion at this point going for w_char * for strings
handling to
>support
>Unicode char-sets.

What I have in mind is to make it a compile time decision whether Axis
uses utf-8 or utf-16. So Axis can be built in either ways. But this
doesn't mean that Axis receives the other encoding. It is the
responsibility of the XML parser to transcode the XML data to the
encoding that Axis has been built.

ie: if UTF8 build of Axis receives a message in UTF16, the XML parser
should transcode the data to UTF8. But all messges sent out will always
be in UTF8.

ie: if UTF16 build of Axis receives a message in UTF8, the XML parser
should transcode the data to UTF16. But all messges sent out will always
be in UTF16.

I think this is ok because WS-I profile says,

"while a sender might choose whether to encode XML in UTF-8 or UTF-16
when sending a message, a receiver must be capable of using either"

In order to do this we have to define a AxisChar so that compiler
directive will decide whether AxisChar is char(8-bit) or short(16-bit).
This means that we have to have all string manipulation functions like
strcmp, strcpy etc that can be used with either char or short strings.
To solve this problem we might be able to use some existing opensource
tool kits rather than writing our own set of functions to replace
strcmp, strcpy, strcat etc.

---
Susantha.

>Think twice when you replace std::string with your impl. of a String
class.
>Do a performance
>comparison and usability study. Your class should out perform
std::string in
>the context
>you use. o/w stick with std::string.
>
>Writing your own Map class is also the same -- std::map impl. uses best
>algorithms available
>today. hence, If you go for your impl. you should think twice. -- if it
>possible to replace
>key/value lookup with index/value lookup then just go for using an
array.
>o/w its an overkill.
>and harder to use. void pointer casting require some more instruction
in
>it's machine code.
>But compile time binding does not involve any additional instruction in
its
>object code.
>
>But if you could make your code simple and avoid using std::maps,
that's
>really improves the
>performance. Iff simple C code/algo can do the same task.
>
>But f we can avoid using following in most of the cases if you know the
size
>of the collection
>before hand. by using a simple array.
>
>std::queue
>std::list
>std::stack
>
>regards
>-Lilantha
>
>
>-----Original Message-----
>From: Susantha Kumara [mailto:susantha@opensource.lk]
>Sent: Wednesday, May 05, 2004 11:30 PM
>To: axis-c-dev
>Subject: STL elimination - Suggestions
>
>
>Hi all,
>
>I would suggest following order and steps to gradually eliminate STL in
Axis
>C++ codebase.
>
>std::string
>Use "char*" instead of "string" wherever possible - need to take care
of the
>memory allocation and deallocation
>Write our own String class to replace the strings left after step 1.
>std::map
>Review the code base and minimize the use of std::map's functions.
Replace
>std::map with arrays or lists where possible
>Write our own Map class with minimum functionality and use it in yet
>existing places after step 1. Even if we don't use templates to write
this
>map class we can make this almost generic if we use a class which has
>void* for key
>void* for value
>allocator/deallocator provided externally
>key comparison function provided externally
>
>std::queue
>std::list
>std::stack
>Probably we can use linked lists to replace these.
>
>
>---
>Susantha.
>
>





RE: Encoding support - [WAS] RE: STL elimination - Suggestions

Posted by Susantha Kumara <su...@opensource.lk>.
Hi Samisa,

> -----Original Message-----
> From: Samisa Abeysinghe [mailto:samisa_abeysinghe@yahoo.com]
> Sent: Friday, May 07, 2004 2:13 PM
> To: Apache AXIS C Developers List
> Subject: Re: Encoding support - [WAS] RE: STL elimination -
Suggestions
> 
> Susantha,
>    I am not clear about this.
> > What I have in mind is to make it a compile time decision whether
Axis
> > uses utf-8 or utf-16. So Axis can be built in either ways. But this
> > doesn't mean that Axis receives the other encoding. It is the
> > responsibility of the XML parser to transcode the XML data to the
> > encoding that Axis has been built.
> 
>    Is it the responsibility of the parser to take care of the
encoding?

Yes. A parser should parse both utf-8 and utf-16 and convert the data
into the application's encoding type.

>    Rather, my understanding is that the serializer is responsible
> regarding the encoding to be
> used for outgoing messages.

Yes iff it is needed to serialize in both utf-8 and utf-16 decided at
runtime. But WS-I allows a SOAP engine to choose one of the encoding for
outgoing messages.

>    Hence there needs to be an abstraction layer for encoder, whose
> services would be used by the
> serializer.
> 
> Thnaks,
> Samisa...
> 
> 
> 
> --- Susantha Kumara <su...@opensource.lk> wrote:
> > Hi Lilantha,
> >
> >
> > >-----Original Message-----
> > >From: Lilantha Darshana [mailto:ldarshana@edocs.com]
> > >Sent: Friday, May 07, 2004 4:11 AM
> > >To: 'Apache AXIS C Developers List'
> > >Subject: RE: STL elimination - Suggestions
> > >
> > >I would suggestion at this point going for w_char * for strings
> > handling to
> > >support
> > >Unicode char-sets.
> >
> >
> > ie: if UTF8 build of Axis receives a message in UTF16, the XML
parser
> > should transcode the data to UTF8. But all messges sent out will
always
> > be in UTF8.
> >
> > ie: if UTF16 build of Axis receives a message in UTF8, the XML
parser
> > should transcode the data to UTF16. But all messges sent out will
always
> > be in UTF16.
> >
> > I think this is ok because WS-I profile says,
> >
> > "while a sender might choose whether to encode XML in UTF-8 or
UTF-16
> > when sending a message, a receiver must be capable of using either"
> >
> > In order to do this we have to define a AxisChar so that compiler
> > directive will decide whether AxisChar is char(8-bit) or
short(16-bit).
> > This means that we have to have all string manipulation functions
like
> > strcmp, strcpy etc that can be used with either char or short
strings.
> > To solve this problem we might be able to use some existing
opensource
> > tool kits rather than writing our own set of functions to replace
> > strcmp, strcpy, strcat etc.
> >
> > ---
> > Susantha.
> >
> > >Think twice when you replace std::string with your impl. of a
String
> > class.
> > >Do a performance
> > >comparison and usability study. Your class should out perform
> > std::string in
> > >the context
> > >you use. o/w stick with std::string.
> > >
> > >Writing your own Map class is also the same -- std::map impl. uses
best
> > >algorithms available
> > >today. hence, If you go for your impl. you should think twice. --
if it
> > >possible to replace
> > >key/value lookup with index/value lookup then just go for using an
> > array.
> > >o/w its an overkill.
> > >and harder to use. void pointer casting require some more
instruction
> > in
> > >it's machine code.
> > >But compile time binding does not involve any additional
instruction in
> > its
> > >object code.
> > >
> > >But if you could make your code simple and avoid using std::maps,
> > that's
> > >really improves the
> > >performance. Iff simple C code/algo can do the same task.
> > >
> > >But f we can avoid using following in most of the cases if you know
the
> > size
> > >of the collection
> > >before hand. by using a simple array.
> > >
> > >std::queue
> > >std::list
> > >std::stack
> > >
> > >regards
> > >-Lilantha
> > >
> > >
> > >-----Original Message-----
> > >From: Susantha Kumara [mailto:susantha@opensource.lk]
> > >Sent: Wednesday, May 05, 2004 11:30 PM
> > >To: axis-c-dev
> > >Subject: STL elimination - Suggestions
> > >
> > >
> > >Hi all,
> > >
> > >I would suggest following order and steps to gradually eliminate
STL in
> > Axis
> > >C++ codebase.
> > >
> > >std::string
> > >Use "char*" instead of "string" wherever possible - need to take
care
> > of the
> > >memory allocation and deallocation
> > >Write our own String class to replace the strings left after step
1.
> > >std::map
> > >Review the code base and minimize the use of std::map's functions.
> > Replace
> > >std::map with arrays or lists where possible
> > >Write our own Map class with minimum functionality and use it in
yet
> > >existing places after step 1. Even if we don't use templates to
write
> > this
> > >map class we can make this almost generic if we use a class which
has
> > >void* for key
> > >void* for value
> > >allocator/deallocator provided externally
> > >key comparison function provided externally
> > >
> > >std::queue
> > >std::list
> > >std::stack
> > >Probably we can use linked lists to replace these.
> > >
> > >
> > >---
> > >Susantha.
> > >
> > >
> >
> >
> 
> 
> 
> 
> 
> __________________________________
> Do you Yahoo!?
> Win a $20,000 Career Makeover at Yahoo! HotJobs
> http://hotjobs.sweepstakes.yahoo.com/careermakeover




Re: Encoding support - [WAS] RE: STL elimination - Suggestions

Posted by Samisa Abeysinghe <sa...@yahoo.com>.
Susantha,
   I am not clear about this.
> What I have in mind is to make it a compile time decision whether Axis
> uses utf-8 or utf-16. So Axis can be built in either ways. But this
> doesn't mean that Axis receives the other encoding. It is the
> responsibility of the XML parser to transcode the XML data to the
> encoding that Axis has been built.

   Is it the responsibility of the parser to take care of the encoding?
   Rather, my understanding is that the serializer is responsible regarding the encoding to be
used  for outgoing messages.
   Hence there needs to be an abstraction layer for encoder, whose services would be used by the
serializer.

Thnaks,
Samisa...



--- Susantha Kumara <su...@opensource.lk> wrote:
> Hi Lilantha,
> 
> 
> >-----Original Message-----
> >From: Lilantha Darshana [mailto:ldarshana@edocs.com] 
> >Sent: Friday, May 07, 2004 4:11 AM
> >To: 'Apache AXIS C Developers List'
> >Subject: RE: STL elimination - Suggestions
> >
> >I would suggestion at this point going for w_char * for strings
> handling to
> >support 
> >Unicode char-sets.
> 
> 
> ie: if UTF8 build of Axis receives a message in UTF16, the XML parser
> should transcode the data to UTF8. But all messges sent out will always
> be in UTF8.
> 
> ie: if UTF16 build of Axis receives a message in UTF8, the XML parser
> should transcode the data to UTF16. But all messges sent out will always
> be in UTF16.
> 
> I think this is ok because WS-I profile says,
> 
> "while a sender might choose whether to encode XML in UTF-8 or UTF-16
> when sending a message, a receiver must be capable of using either"
> 
> In order to do this we have to define a AxisChar so that compiler
> directive will decide whether AxisChar is char(8-bit) or short(16-bit).
> This means that we have to have all string manipulation functions like
> strcmp, strcpy etc that can be used with either char or short strings. 
> To solve this problem we might be able to use some existing opensource
> tool kits rather than writing our own set of functions to replace
> strcmp, strcpy, strcat etc.
> 
> ---
> Susantha.
> 
> >Think twice when you replace std::string with your impl. of a String
> class.
> >Do a performance 
> >comparison and usability study. Your class should out perform
> std::string in
> >the context 
> >you use. o/w stick with std::string.
> >
> >Writing your own Map class is also the same -- std::map impl. uses best
> >algorithms available
> >today. hence, If you go for your impl. you should think twice. -- if it
> >possible to replace 
> >key/value lookup with index/value lookup then just go for using an
> array.
> >o/w its an overkill. 
> >and harder to use. void pointer casting require some more instruction
> in
> >it's machine code. 
> >But compile time binding does not involve any additional instruction in
> its
> >object code. 
> >
> >But if you could make your code simple and avoid using std::maps,
> that's
> >really improves the 
> >performance. Iff simple C code/algo can do the same task.
> >
> >But f we can avoid using following in most of the cases if you know the
> size
> >of the collection 
> >before hand. by using a simple array.
> >
> >std::queue
> >std::list
> >std::stack
> >
> >regards 
> >-Lilantha 
> >
> >
> >-----Original Message-----
> >From: Susantha Kumara [mailto:susantha@opensource.lk]
> >Sent: Wednesday, May 05, 2004 11:30 PM
> >To: axis-c-dev
> >Subject: STL elimination - Suggestions
> >
> >
> >Hi all,
> > 
> >I would suggest following order and steps to gradually eliminate STL in
> Axis
> >C++ codebase.
> > 
> >std::string
> >Use "char*" instead of "string" wherever possible - need to take care
> of the
> >memory allocation and deallocation 
> >Write our own String class to replace the strings left after step 1. 
> >std::map
> >Review the code base and minimize the use of std::map's functions.
> Replace
> >std::map with arrays or lists where possible 
> >Write our own Map class with minimum functionality and use it in yet
> >existing places after step 1. Even if we don't use templates to write
> this
> >map class we can make this almost generic if we use a class which has 
> >void* for key 
> >void* for value 
> >allocator/deallocator provided externally 
> >key comparison function provided externally 
> > 
> >std::queue
> >std::list
> >std::stack
> >Probably we can use linked lists to replace these. 
> > 
> > 
> >---
> >Susantha.
> >
> >
> 
> 



	
		
__________________________________
Do you Yahoo!?
Win a $20,000 Career Makeover at Yahoo! HotJobs  
http://hotjobs.sweepstakes.yahoo.com/careermakeover