You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@activemq.apache.org by Bhavani Iyer <bh...@gmail.com> on 2010/01/20 18:10:02 UTC

Re: [jira] Created: (AMQCPP-261) Handle Multibyte Strings or Strings encoded in Charsets other than US-ASCII

Hi,

I recently migrated from 2.1.3 to 3.1.0 and found that I can no longer send
UTF 8 multibyte characters in the payload of a TextMessage. As a workaround
until this issue is resolved, I modified ActiveMQTextMessage::getText()
method to return the raw bytes from getContents() and bypassing the call to
OpenWireConnector::readString().

Here is the modified ActiveMQTextMessage::getText()

if( this->text.get() != NULL ) { return *( this->text.get() ); } else {

if( this->getContent().size() <= 4 ) { return ""; }
//// to get around ASCII text restriction
return std::string( (const char*)&getContent()[4], getContent().size()-4 );
}

Please let me know if this has any unintended consequence.
Thanks

JIRA jira@apache.org wrote:
> 
> Handle Multibyte Strings or Strings encoded in Charsets other than
> US-ASCII
> ---------------------------------------------------------------------------
> 
>                  Key: AMQCPP-261
>                  URL: https://issues.apache.org/activemq/browse/AMQCPP-261
>              Project: ActiveMQ C++ Client
>           Issue Type: Improvement
>           Components: CMS Impl, Decaf, Openwire
>     Affects Versions: 3.0.1
>             Reporter: Timothy Bish
>             Assignee: Timothy Bish
> 
> 
> The CMS API defines the interface for Strings in the TextMessage using the
> C++ std::string and const char* primitives and doesn't consider character
> encodings in its interface or the use of multibyte string representations.  
> 
> In order to allow the use of Strings between Java and C++ and .NET clients
> the strings in the TextMessage as well as those in MapMessage,
> StreamMessage, and BytesMessage (when wreiteUTF and readUTF are called) as
> well as message properties of the string type are encoded in the JAVA
> standard Modified UTF-8 format for serialized strings.  This design makes
> the assumption that strings passed are in US-ASCII format and that the
> strings from the broker are also encoded with no char values greater than
> 255 and throws an exception if one is encountered.  
> 
> The CMS interface needs to be extended to allow for more flexible string
> handling and offer a mechanism to deal with string encodings other than
> ASCII. 
> 
> Another alternative is to change the assumption about strings in the CMS
> API to assume that all string are given as either ASCII strings with chars
> < 127 and no embedded nulls or are already encoded by the user as Modified
> UTF-8 by the user so that a Java or .NET client can read all strings sent
> in CMS Messages as well.
> 
> 
> 
> 
> 
> -- 
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
> 
> 
> 

-- 
View this message in context: http://old.nabble.com/-jira--Created%3A-%28AMQCPP-261%29-Handle-Multibyte-Strings-or-Strings-encoded-in-Charsets-other-than-US-ASCII-tp25265866p27245213.html
Sent from the ActiveMQ - Dev mailing list archive at Nabble.com.


Re: [jira] Created: (AMQCPP-261) Handle Multibyte Strings or Strings encoded in Charsets other than US-ASCII

Posted by Timothy Bish <ta...@gmail.com>.
On Wed, 2010-01-20 at 09:10 -0800, Bhavani Iyer wrote:
> Hi,
> 
> I recently migrated from 2.1.3 to 3.1.0 and found that I can no longer send
> UTF 8 multibyte characters in the payload of a TextMessage. As a workaround
> until this issue is resolved, I modified ActiveMQTextMessage::getText()
> method to return the raw bytes from getContents() and bypassing the call to
> OpenWireConnector::readString().
> 
> Here is the modified ActiveMQTextMessage::getText()
> 
> if( this->text.get() != NULL ) { return *( this->text.get() ); } else {
> 
> if( this->getContent().size() <= 4 ) { return ""; }
> //// to get around ASCII text restriction
> return std::string( (const char*)&getContent()[4], getContent().size()-4 );
> }
> 
> Please let me know if this has any unintended consequence.
> Thanks

There shouldn't impact on the client from this change.  I think the fix
for this issue will be to remove the string encoding altogether and just
enforce that anyone that wants to send anything more than plain ASCII
needs to do the UTF-8 encoding themselves.

Regards
Tim.


> 
> JIRA jira@apache.org wrote:
> > 
> > Handle Multibyte Strings or Strings encoded in Charsets other than
> > US-ASCII
> > ---------------------------------------------------------------------------
> > 
> >                  Key: AMQCPP-261
> >                  URL: https://issues.apache.org/activemq/browse/AMQCPP-261
> >              Project: ActiveMQ C++ Client
> >           Issue Type: Improvement
> >           Components: CMS Impl, Decaf, Openwire
> >     Affects Versions: 3.0.1
> >             Reporter: Timothy Bish
> >             Assignee: Timothy Bish
> > 
> > 
> > The CMS API defines the interface for Strings in the TextMessage using the
> > C++ std::string and const char* primitives and doesn't consider character
> > encodings in its interface or the use of multibyte string representations.  
> > 
> > In order to allow the use of Strings between Java and C++ and .NET clients
> > the strings in the TextMessage as well as those in MapMessage,
> > StreamMessage, and BytesMessage (when wreiteUTF and readUTF are called) as
> > well as message properties of the string type are encoded in the JAVA
> > standard Modified UTF-8 format for serialized strings.  This design makes
> > the assumption that strings passed are in US-ASCII format and that the
> > strings from the broker are also encoded with no char values greater than
> > 255 and throws an exception if one is encountered.  
> > 
> > The CMS interface needs to be extended to allow for more flexible string
> > handling and offer a mechanism to deal with string encodings other than
> > ASCII. 
> > 
> > Another alternative is to change the assumption about strings in the CMS
> > API to assume that all string are given as either ASCII strings with chars
> > < 127 and no embedded nulls or are already encoded by the user as Modified
> > UTF-8 by the user so that a Java or .NET client can read all strings sent
> > in CMS Messages as well.
> > 
> > 
> > 
> > 
> > 
> > -- 
> > This message is automatically generated by JIRA.
> > -
> > You can reply to this email to add a comment to the issue online.
> > 
> > 
> > 
> 

-- 
Tim Bish
http://fusesource.com
http://timbish.blogspot.com/