You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@activemq.apache.org by Timothy Bish <ta...@gmail.com> on 2009/09/01 00:41:57 UTC

RE: ActiveMQcpp cms::Session::createTextMessage() function usage question with ISO-8859-1 strings

On Mon, 2009-08-31 at 08:52 -0700, rainy3 wrote:
> Hello,
> 
> I just need more clarification on this case. Assume that i have a ActiveMQ
> broker, java producer and c++ consumer for text messages. Producer read some
> text from database for example in ISO-8859-1 charset. Then it creates text
> message (conversion to UTF-8) and send it to broker. Then, finally, consumer
> won't be able to read this text messages because backward conversion (C++
> client) allows only ASCII chars, right?

When the String value in the TextMessage (Java Strings are in UTF-16) is
written on the wire its converted to a modified UTF-8 representation.
The C++ expects all the string values it reads from the Broker to be in
this representation, so it knows how to read and convert the UTF-8
values back to single byte char values.  The CMS API doesn't currently
have methods for dealing with multibyte C++ strings so strings are
expected to be encoded with values from 0-255.   For ISO-8859-1 that
should be ok since as I recall that defines values from 0-191 so your
strings should be read fine.

Of course the C++ string you get out is going to be composed of a char*
array which is a signed byte, but you can always cast it to an unsigned
char* and use the full range 0-255 as you please.

Does that help?

Regards
Tim.


-- 
Tim Bish
http://fusesource.com
http://timbish.blogspot.com/




Re: ActiveMQcpp cms::Session::createTextMessage() function usage question with ISO-8859-1 strings

Posted by rainy3 <ra...@poczta.onet.pl>.
I think that properly used getContent would solve mine problem. Thanks for
all help.
-- 
View this message in context: http://www.nabble.com/ActiveMQcpp-cms%3A%3ASession%3A%3AcreateTextMessage%28%29-function-usage-question-with-ISO-8859-1-strings-tp24747592p25263459.html
Sent from the ActiveMQ - Dev mailing list archive at Nabble.com.


Re: ActiveMQcpp cms::Session::createTextMessage() function usage question with ISO-8859-1 strings

Posted by Timothy Bish <ta...@gmail.com>.
On Wed, 2009-09-02 at 10:06 -0700, martin@schlapfer.com wrote:
> Ah, I see.
> 
> Although I have not tried this, it looks like the getContent() method is
> public, so you could use the getContent() method rather than the getText()
> method.
> 
> You can then transform the payload data in the text message yourself
> (doing what was done in 2.2.1 getText() method) as a work around.
> 

You can grab the raw data from the getContent method, that will work
fine, just remember that the first four bytes are the size prefix of the
encoded data, and that the data is encoded in Java's Modified UTF-8
format, not plain UTF-8 so the null character (U+0000) is encoded as
0xC0,0x80 rather than 0x00. See:

http://en.wikipedia.org/wiki/UTF-8#Modified_UTF-8

Regards
Tim.

> On Wed, September 2, 2009 12:47 am, rainy3 wrote:
> >
> 
> >>> So it should be transparent to the user.  Your example should still
> >>> work when encoded to UTF-8, transmitted on the wire, and decoded back
> >>> into a character string (ie. treat 0xC4 as one character and 0x84 as
> >>> the subsequent character in the std::string).
> >
> > ActiveMQTextMessage.cpp (version 2.2.1)
> >
> >
> > std::string ActiveMQTextMessage::getText() const throw( cms::CMSException
> > )
> > {
> >
> >
> > try{ if( getContent().size() <= 4 ) { return ""; }
> >
> >
> > return std::string( (const char*)&getContent()[4], getContent().size()-4
> > );
> > }
> > AMQ_CATCH_RETHROW( ActiveMQException )
> > AMQ_CATCH_EXCEPTION_CONVERT( Exception, ActiveMQException )
> > AMQ_CATCHALL_THROW( ActiveMQException )
> > }
> >
> >
> > ActiveMQTextMessage.cpp (version 3.0.1)
> >
> >
> > std::string ActiveMQTextMessage::getText() const throw( cms::CMSException
> > )
> > {
> >
> >
> > try{ if( getContent().size() <= 4 ) { return ""; }
> >
> >
> > decaf::io::ByteArrayInputStream bais( getContent() );
> > decaf::io::DataInputStream dataIn( &bais );
> >
> >
> > return OpenwireStringSupport::readString( dataIn ); }
> > AMQ_CATCH_ALL_THROW_CMSEXCEPTION()
> > }
> >
> >
> > std::string OpenwireStringSupport::readString(
> > decaf::io::DataInputStream&
> > dataIn ) throw ( decaf::io::IOException ) {
> >
> > try {
> >
> > int utfLength = dataIn.readInt();
> >
> > if( utfLength <= 0 ) { return ""; }
> >
> >
> > ...
> >
> >
> > // a = 0xC4 so, here is a place where fail occurs
> > if( a & 0x1C ) { throw UTFDataFormatException( __FILE__, __LINE__,
> > "Invalid 2 byte UTF-8 encoding found, "
> > "This method only supports encoded ASCII values of
> > (0-255)." );
> > }
> >
> >
> > ...
> >
> >
> > return std::string( (char*)( &result[0] ), index ); }
> > AMQ_CATCH_RETHROW( decaf::io::IOException )
> > AMQ_CATCH_EXCEPTION_CONVERT( Exception, decaf::io::IOException )
> > AMQ_CATCHALL_THROW( decaf::io::IOException )
> > }
> >
> >
> > I don't have access to producer's java source but as far as i know it
> > reads text from database and then sends it as textMessage to ActiveMQ
> > broker. During/before sending the char 'A,' is UTF-8 encoded as 0xC484
> > octet stream so new getText method won't allow to reconvert it.
> >
> > --
> > View this message in context:
> > http://www.nabble.com/ActiveMQcpp-cms%3A%3ASession%3A%3AcreateTextMessage
> > %28%29-function-usage-question-with-ISO-8859-1-strings-tp24747592p2525305
> > 4.html
> > Sent from the ActiveMQ - Dev mailing list archive at Nabble.com.
> >
> >
> >
> 
> 
-- 
Tim Bish
http://fusesource.com
http://timbish.blogspot.com/




Re: ActiveMQcpp cms::Session::createTextMessage() function usage question with ISO-8859-1 strings

Posted by ma...@schlapfer.com.
Ah, I see.

Although I have not tried this, it looks like the getContent() method is
public, so you could use the getContent() method rather than the getText()
method.

You can then transform the payload data in the text message yourself
(doing what was done in 2.2.1 getText() method) as a work around.

On Wed, September 2, 2009 12:47 am, rainy3 wrote:
>

>>> So it should be transparent to the user.  Your example should still
>>> work when encoded to UTF-8, transmitted on the wire, and decoded back
>>> into a character string (ie. treat 0xC4 as one character and 0x84 as
>>> the subsequent character in the std::string).
>
> ActiveMQTextMessage.cpp (version 2.2.1)
>
>
> std::string ActiveMQTextMessage::getText() const throw( cms::CMSException
> )
> {
>
>
> try{ if( getContent().size() <= 4 ) { return ""; }
>
>
> return std::string( (const char*)&getContent()[4], getContent().size()-4
> );
> }
> AMQ_CATCH_RETHROW( ActiveMQException )
> AMQ_CATCH_EXCEPTION_CONVERT( Exception, ActiveMQException )
> AMQ_CATCHALL_THROW( ActiveMQException )
> }
>
>
> ActiveMQTextMessage.cpp (version 3.0.1)
>
>
> std::string ActiveMQTextMessage::getText() const throw( cms::CMSException
> )
> {
>
>
> try{ if( getContent().size() <= 4 ) { return ""; }
>
>
> decaf::io::ByteArrayInputStream bais( getContent() );
> decaf::io::DataInputStream dataIn( &bais );
>
>
> return OpenwireStringSupport::readString( dataIn ); }
> AMQ_CATCH_ALL_THROW_CMSEXCEPTION()
> }
>
>
> std::string OpenwireStringSupport::readString(
> decaf::io::DataInputStream&
> dataIn ) throw ( decaf::io::IOException ) {
>
> try {
>
> int utfLength = dataIn.readInt();
>
> if( utfLength <= 0 ) { return ""; }
>
>
> ...
>
>
> // a = 0xC4 so, here is a place where fail occurs
> if( a & 0x1C ) { throw UTFDataFormatException( __FILE__, __LINE__,
> "Invalid 2 byte UTF-8 encoding found, "
> "This method only supports encoded ASCII values of
> (0-255)." );
> }
>
>
> ...
>
>
> return std::string( (char*)( &result[0] ), index ); }
> AMQ_CATCH_RETHROW( decaf::io::IOException )
> AMQ_CATCH_EXCEPTION_CONVERT( Exception, decaf::io::IOException )
> AMQ_CATCHALL_THROW( decaf::io::IOException )
> }
>
>
> I don't have access to producer's java source but as far as i know it
> reads text from database and then sends it as textMessage to ActiveMQ
> broker. During/before sending the char 'A,' is UTF-8 encoded as 0xC484
> octet stream so new getText method won't allow to reconvert it.
>
> --
> View this message in context:
> http://www.nabble.com/ActiveMQcpp-cms%3A%3ASession%3A%3AcreateTextMessage
> %28%29-function-usage-question-with-ISO-8859-1-strings-tp24747592p2525305
> 4.html
> Sent from the ActiveMQ - Dev mailing list archive at Nabble.com.
>
>
>



Re: ActiveMQcpp cms::Session::createTextMessage() function usage question with ISO-8859-1 strings

Posted by rainy3 <ra...@poczta.onet.pl>.
>> So it should be transparent to the user.  Your example should still work 
>> when encoded to UTF-8, transmitted on the wire, and decoded back into a 
>> character string (ie. treat 0xC4 as one character and 0x84 as the 
>> subsequent character in the std::string).

ActiveMQTextMessage.cpp (version 2.2.1)

std::string ActiveMQTextMessage::getText() const throw( cms::CMSException )
{

    try{
        if( getContent().size() <= 4 ) {
            return "";
        }

        return std::string( (const char*)&getContent()[4],
getContent().size()-4 );
    }
    AMQ_CATCH_RETHROW( ActiveMQException )
    AMQ_CATCH_EXCEPTION_CONVERT( Exception, ActiveMQException )
    AMQ_CATCHALL_THROW( ActiveMQException )
} 

ActiveMQTextMessage.cpp (version 3.0.1)

std::string ActiveMQTextMessage::getText() const throw( cms::CMSException )
{

    try{
        if( getContent().size() <= 4 ) {
            return "";
        }

        decaf::io::ByteArrayInputStream bais( getContent() );
        decaf::io::DataInputStream dataIn( &bais );

        return OpenwireStringSupport::readString( dataIn );
    }
    AMQ_CATCH_ALL_THROW_CMSEXCEPTION()
} 

std::string OpenwireStringSupport::readString( decaf::io::DataInputStream&
dataIn )
    throw ( decaf::io::IOException ) {

    try {

        int utfLength = dataIn.readInt();

        if( utfLength <= 0 ) {
            return "";
        } 

...

                // a = 0xC4 so, here is a place where fail occurs
                if( a & 0x1C ) {
                    throw UTFDataFormatException(
                        __FILE__, __LINE__,
                        "Invalid 2 byte UTF-8 encoding found, "
                        "This method only supports encoded ASCII values of
(0-255)." );
                }
 
...

        return std::string( (char*)( &result[0] ), index );
    }
    AMQ_CATCH_RETHROW( decaf::io::IOException )
    AMQ_CATCH_EXCEPTION_CONVERT( Exception, decaf::io::IOException )
    AMQ_CATCHALL_THROW( decaf::io::IOException )
} 

I don't have access to producer's java source but as far as i know it reads
text from database and then sends it as textMessage to ActiveMQ broker.
During/before sending the char 'A,' is UTF-8 encoded as 0xC484 octet stream
so new getText method won't allow to reconvert it.

-- 
View this message in context: http://www.nabble.com/ActiveMQcpp-cms%3A%3ASession%3A%3AcreateTextMessage%28%29-function-usage-question-with-ISO-8859-1-strings-tp24747592p25253054.html
Sent from the ActiveMQ - Dev mailing list archive at Nabble.com.


Re: ActiveMQcpp cms::Session::createTextMessage() function usage question with ISO-8859-1 strings

Posted by Martin Schlapfer <ma...@schlapfer.com>.
The string representation remains the same, which is a std::string 
between C++ clients.  It is the *transmission* of the std::string on the 
wire between the client and the server that transforms the string from 
0-255 byte values to UTF-8.

So it should be transparent to the user.  Your example should still work 
when encoded to UTF-8, transmitted on the wire, and decoded back into a 
character string (ie. treat 0xC4 as one character and 0x84 as the 
subsequent character in the std::string).

Unless of course you are talking about a Java client transmitting a 
character value greater than 255 across the wire to a C++ client.  Then 
of course, this will not work.  It would not have worked in previous 
versions either.

rainy3 said the following on 9/1/2009 8:52 AM:
>> The CMS API doesn't currently have methods for dealing with multibyte C++ 
>> strings so strings are expected to be encoded with values from 0-255.
>>     
>
> So, with new ActiveMQcpp C++ consumer won't be able to reconvert a text
> message with UTF-8 char for example 0xC484 (this char stays for one letter
> which looks like 'A,'). It's kinda sad cause old consumer was able to deal
> with this (it was using iconv to reconvert text message from UTF-8 to proper
> charset and it was working like a charm). Now, i guess, only way to make
> this still possible is to rewrite both producer and consumer to use binary
> messages. Maybe, in a future releases of ActiveMQcpp, there would be a
> chance for some kind system dependend compilation to allow linux/unix
> systems with native iconv support to deal with those kind of text messages?
>   


RE: ActiveMQcpp cms::Session::createTextMessage() function usage question with ISO-8859-1 strings

Posted by rainy3 <ra...@poczta.onet.pl>.
> The CMS API doesn't currently have methods for dealing with multibyte C++ 
> strings so strings are expected to be encoded with values from 0-255.

So, with new ActiveMQcpp C++ consumer won't be able to reconvert a text
message with UTF-8 char for example 0xC484 (this char stays for one letter
which looks like 'A,'). It's kinda sad cause old consumer was able to deal
with this (it was using iconv to reconvert text message from UTF-8 to proper
charset and it was working like a charm). Now, i guess, only way to make
this still possible is to rewrite both producer and consumer to use binary
messages. Maybe, in a future releases of ActiveMQcpp, there would be a
chance for some kind system dependend compilation to allow linux/unix
systems with native iconv support to deal with those kind of text messages?
-- 
View this message in context: http://www.nabble.com/ActiveMQcpp-cms%3A%3ASession%3A%3AcreateTextMessage%28%29-function-usage-question-with-ISO-8859-1-strings-tp24747592p25242884.html
Sent from the ActiveMQ - Dev mailing list archive at Nabble.com.