You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@activemq.apache.org by Olivier Langlois <Ol...@streamtheworld.com> on 2009/07/30 23:24:01 UTC

ActiveMQcpp cms::Session::createTextMessage() function usage question with ISO-8859-1 strings

Hi,

An exception is thrown by the java receiving side:

Caused by: java.io.UTFDataFormatException
        at org.apache.activemq.util.MarshallingSupport.readUTF8(MarshallingSupport.java:366)
        at org.apache.activemq.command.ActiveMQTextMessage.getText(ActiveMQTextMessage.java:86)
        ... 17 more

The producer of this message is using ActiveMQcpp and the string passed to createTextMessage() is a string using the ISO-8859-1 charset and it is not encoded in UTF-8. That string can contain characters with a value higher than 127 and may result into an invalid UTF-8 string.

I have checked the activemqcpp HTML doxygen doc for this function to seek an answer but the function description is omitting to specify the requirements on the function input parameter string requirements. Is the function expect the string to be UTF-8?

Thank you,
Olivier Langlois
Senior C++ Programmer

STREAMTHEWORLD

t. 1 866 448 4037 ext. 675
t. 1 514 448 4037 ext. 675
f. 1 514 807 1861

olivier.langlois@streamtheworld.com
streamtheworld.com
 
StreamTheWorld launches its new BlackBerry application. Learn more


Re: ActiveMQcpp cms::Session::createTextMessage() function usage question with ISO-8859-1 strings

Posted by rainy3 <ra...@poczta.onet.pl>.
I think that properly used getContent would solve mine problem. Thanks for
all help.
-- 
View this message in context: http://www.nabble.com/ActiveMQcpp-cms%3A%3ASession%3A%3AcreateTextMessage%28%29-function-usage-question-with-ISO-8859-1-strings-tp24747592p25263459.html
Sent from the ActiveMQ - Dev mailing list archive at Nabble.com.


Re: ActiveMQcpp cms::Session::createTextMessage() function usage question with ISO-8859-1 strings

Posted by Timothy Bish <ta...@gmail.com>.
On Wed, 2009-09-02 at 10:06 -0700, martin@schlapfer.com wrote:
> Ah, I see.
> 
> Although I have not tried this, it looks like the getContent() method is
> public, so you could use the getContent() method rather than the getText()
> method.
> 
> You can then transform the payload data in the text message yourself
> (doing what was done in 2.2.1 getText() method) as a work around.
> 

You can grab the raw data from the getContent method, that will work
fine, just remember that the first four bytes are the size prefix of the
encoded data, and that the data is encoded in Java's Modified UTF-8
format, not plain UTF-8 so the null character (U+0000) is encoded as
0xC0,0x80 rather than 0x00. See:

http://en.wikipedia.org/wiki/UTF-8#Modified_UTF-8

Regards
Tim.

> On Wed, September 2, 2009 12:47 am, rainy3 wrote:
> >
> 
> >>> So it should be transparent to the user.  Your example should still
> >>> work when encoded to UTF-8, transmitted on the wire, and decoded back
> >>> into a character string (ie. treat 0xC4 as one character and 0x84 as
> >>> the subsequent character in the std::string).
> >
> > ActiveMQTextMessage.cpp (version 2.2.1)
> >
> >
> > std::string ActiveMQTextMessage::getText() const throw( cms::CMSException
> > )
> > {
> >
> >
> > try{ if( getContent().size() <= 4 ) { return ""; }
> >
> >
> > return std::string( (const char*)&getContent()[4], getContent().size()-4
> > );
> > }
> > AMQ_CATCH_RETHROW( ActiveMQException )
> > AMQ_CATCH_EXCEPTION_CONVERT( Exception, ActiveMQException )
> > AMQ_CATCHALL_THROW( ActiveMQException )
> > }
> >
> >
> > ActiveMQTextMessage.cpp (version 3.0.1)
> >
> >
> > std::string ActiveMQTextMessage::getText() const throw( cms::CMSException
> > )
> > {
> >
> >
> > try{ if( getContent().size() <= 4 ) { return ""; }
> >
> >
> > decaf::io::ByteArrayInputStream bais( getContent() );
> > decaf::io::DataInputStream dataIn( &bais );
> >
> >
> > return OpenwireStringSupport::readString( dataIn ); }
> > AMQ_CATCH_ALL_THROW_CMSEXCEPTION()
> > }
> >
> >
> > std::string OpenwireStringSupport::readString(
> > decaf::io::DataInputStream&
> > dataIn ) throw ( decaf::io::IOException ) {
> >
> > try {
> >
> > int utfLength = dataIn.readInt();
> >
> > if( utfLength <= 0 ) { return ""; }
> >
> >
> > ...
> >
> >
> > // a = 0xC4 so, here is a place where fail occurs
> > if( a & 0x1C ) { throw UTFDataFormatException( __FILE__, __LINE__,
> > "Invalid 2 byte UTF-8 encoding found, "
> > "This method only supports encoded ASCII values of
> > (0-255)." );
> > }
> >
> >
> > ...
> >
> >
> > return std::string( (char*)( &result[0] ), index ); }
> > AMQ_CATCH_RETHROW( decaf::io::IOException )
> > AMQ_CATCH_EXCEPTION_CONVERT( Exception, decaf::io::IOException )
> > AMQ_CATCHALL_THROW( decaf::io::IOException )
> > }
> >
> >
> > I don't have access to producer's java source but as far as i know it
> > reads text from database and then sends it as textMessage to ActiveMQ
> > broker. During/before sending the char 'A,' is UTF-8 encoded as 0xC484
> > octet stream so new getText method won't allow to reconvert it.
> >
> > --
> > View this message in context:
> > http://www.nabble.com/ActiveMQcpp-cms%3A%3ASession%3A%3AcreateTextMessage
> > %28%29-function-usage-question-with-ISO-8859-1-strings-tp24747592p2525305
> > 4.html
> > Sent from the ActiveMQ - Dev mailing list archive at Nabble.com.
> >
> >
> >
> 
> 
-- 
Tim Bish
http://fusesource.com
http://timbish.blogspot.com/




Re: ActiveMQcpp cms::Session::createTextMessage() function usage question with ISO-8859-1 strings

Posted by ma...@schlapfer.com.
Ah, I see.

Although I have not tried this, it looks like the getContent() method is
public, so you could use the getContent() method rather than the getText()
method.

You can then transform the payload data in the text message yourself
(doing what was done in 2.2.1 getText() method) as a work around.

On Wed, September 2, 2009 12:47 am, rainy3 wrote:
>

>>> So it should be transparent to the user.  Your example should still
>>> work when encoded to UTF-8, transmitted on the wire, and decoded back
>>> into a character string (ie. treat 0xC4 as one character and 0x84 as
>>> the subsequent character in the std::string).
>
> ActiveMQTextMessage.cpp (version 2.2.1)
>
>
> std::string ActiveMQTextMessage::getText() const throw( cms::CMSException
> )
> {
>
>
> try{ if( getContent().size() <= 4 ) { return ""; }
>
>
> return std::string( (const char*)&getContent()[4], getContent().size()-4
> );
> }
> AMQ_CATCH_RETHROW( ActiveMQException )
> AMQ_CATCH_EXCEPTION_CONVERT( Exception, ActiveMQException )
> AMQ_CATCHALL_THROW( ActiveMQException )
> }
>
>
> ActiveMQTextMessage.cpp (version 3.0.1)
>
>
> std::string ActiveMQTextMessage::getText() const throw( cms::CMSException
> )
> {
>
>
> try{ if( getContent().size() <= 4 ) { return ""; }
>
>
> decaf::io::ByteArrayInputStream bais( getContent() );
> decaf::io::DataInputStream dataIn( &bais );
>
>
> return OpenwireStringSupport::readString( dataIn ); }
> AMQ_CATCH_ALL_THROW_CMSEXCEPTION()
> }
>
>
> std::string OpenwireStringSupport::readString(
> decaf::io::DataInputStream&
> dataIn ) throw ( decaf::io::IOException ) {
>
> try {
>
> int utfLength = dataIn.readInt();
>
> if( utfLength <= 0 ) { return ""; }
>
>
> ...
>
>
> // a = 0xC4 so, here is a place where fail occurs
> if( a & 0x1C ) { throw UTFDataFormatException( __FILE__, __LINE__,
> "Invalid 2 byte UTF-8 encoding found, "
> "This method only supports encoded ASCII values of
> (0-255)." );
> }
>
>
> ...
>
>
> return std::string( (char*)( &result[0] ), index ); }
> AMQ_CATCH_RETHROW( decaf::io::IOException )
> AMQ_CATCH_EXCEPTION_CONVERT( Exception, decaf::io::IOException )
> AMQ_CATCHALL_THROW( decaf::io::IOException )
> }
>
>
> I don't have access to producer's java source but as far as i know it
> reads text from database and then sends it as textMessage to ActiveMQ
> broker. During/before sending the char 'A,' is UTF-8 encoded as 0xC484
> octet stream so new getText method won't allow to reconvert it.
>
> --
> View this message in context:
> http://www.nabble.com/ActiveMQcpp-cms%3A%3ASession%3A%3AcreateTextMessage
> %28%29-function-usage-question-with-ISO-8859-1-strings-tp24747592p2525305
> 4.html
> Sent from the ActiveMQ - Dev mailing list archive at Nabble.com.
>
>
>



Re: ActiveMQcpp cms::Session::createTextMessage() function usage question with ISO-8859-1 strings

Posted by rainy3 <ra...@poczta.onet.pl>.
>> So it should be transparent to the user.  Your example should still work 
>> when encoded to UTF-8, transmitted on the wire, and decoded back into a 
>> character string (ie. treat 0xC4 as one character and 0x84 as the 
>> subsequent character in the std::string).

ActiveMQTextMessage.cpp (version 2.2.1)

std::string ActiveMQTextMessage::getText() const throw( cms::CMSException )
{

    try{
        if( getContent().size() <= 4 ) {
            return "";
        }

        return std::string( (const char*)&getContent()[4],
getContent().size()-4 );
    }
    AMQ_CATCH_RETHROW( ActiveMQException )
    AMQ_CATCH_EXCEPTION_CONVERT( Exception, ActiveMQException )
    AMQ_CATCHALL_THROW( ActiveMQException )
} 

ActiveMQTextMessage.cpp (version 3.0.1)

std::string ActiveMQTextMessage::getText() const throw( cms::CMSException )
{

    try{
        if( getContent().size() <= 4 ) {
            return "";
        }

        decaf::io::ByteArrayInputStream bais( getContent() );
        decaf::io::DataInputStream dataIn( &bais );

        return OpenwireStringSupport::readString( dataIn );
    }
    AMQ_CATCH_ALL_THROW_CMSEXCEPTION()
} 

std::string OpenwireStringSupport::readString( decaf::io::DataInputStream&
dataIn )
    throw ( decaf::io::IOException ) {

    try {

        int utfLength = dataIn.readInt();

        if( utfLength <= 0 ) {
            return "";
        } 

...

                // a = 0xC4 so, here is a place where fail occurs
                if( a & 0x1C ) {
                    throw UTFDataFormatException(
                        __FILE__, __LINE__,
                        "Invalid 2 byte UTF-8 encoding found, "
                        "This method only supports encoded ASCII values of
(0-255)." );
                }
 
...

        return std::string( (char*)( &result[0] ), index );
    }
    AMQ_CATCH_RETHROW( decaf::io::IOException )
    AMQ_CATCH_EXCEPTION_CONVERT( Exception, decaf::io::IOException )
    AMQ_CATCHALL_THROW( decaf::io::IOException )
} 

I don't have access to producer's java source but as far as i know it reads
text from database and then sends it as textMessage to ActiveMQ broker.
During/before sending the char 'A,' is UTF-8 encoded as 0xC484 octet stream
so new getText method won't allow to reconvert it.

-- 
View this message in context: http://www.nabble.com/ActiveMQcpp-cms%3A%3ASession%3A%3AcreateTextMessage%28%29-function-usage-question-with-ISO-8859-1-strings-tp24747592p25253054.html
Sent from the ActiveMQ - Dev mailing list archive at Nabble.com.


Re: ActiveMQcpp cms::Session::createTextMessage() function usage question with ISO-8859-1 strings

Posted by Martin Schlapfer <ma...@schlapfer.com>.
The string representation remains the same, which is a std::string 
between C++ clients.  It is the *transmission* of the std::string on the 
wire between the client and the server that transforms the string from 
0-255 byte values to UTF-8.

So it should be transparent to the user.  Your example should still work 
when encoded to UTF-8, transmitted on the wire, and decoded back into a 
character string (ie. treat 0xC4 as one character and 0x84 as the 
subsequent character in the std::string).

Unless of course you are talking about a Java client transmitting a 
character value greater than 255 across the wire to a C++ client.  Then 
of course, this will not work.  It would not have worked in previous 
versions either.

rainy3 said the following on 9/1/2009 8:52 AM:
>> The CMS API doesn't currently have methods for dealing with multibyte C++ 
>> strings so strings are expected to be encoded with values from 0-255.
>>     
>
> So, with new ActiveMQcpp C++ consumer won't be able to reconvert a text
> message with UTF-8 char for example 0xC484 (this char stays for one letter
> which looks like 'A,'). It's kinda sad cause old consumer was able to deal
> with this (it was using iconv to reconvert text message from UTF-8 to proper
> charset and it was working like a charm). Now, i guess, only way to make
> this still possible is to rewrite both producer and consumer to use binary
> messages. Maybe, in a future releases of ActiveMQcpp, there would be a
> chance for some kind system dependend compilation to allow linux/unix
> systems with native iconv support to deal with those kind of text messages?
>   


RE: ActiveMQcpp cms::Session::createTextMessage() function usage question with ISO-8859-1 strings

Posted by rainy3 <ra...@poczta.onet.pl>.
> The CMS API doesn't currently have methods for dealing with multibyte C++ 
> strings so strings are expected to be encoded with values from 0-255.

So, with new ActiveMQcpp C++ consumer won't be able to reconvert a text
message with UTF-8 char for example 0xC484 (this char stays for one letter
which looks like 'A,'). It's kinda sad cause old consumer was able to deal
with this (it was using iconv to reconvert text message from UTF-8 to proper
charset and it was working like a charm). Now, i guess, only way to make
this still possible is to rewrite both producer and consumer to use binary
messages. Maybe, in a future releases of ActiveMQcpp, there would be a
chance for some kind system dependend compilation to allow linux/unix
systems with native iconv support to deal with those kind of text messages?
-- 
View this message in context: http://www.nabble.com/ActiveMQcpp-cms%3A%3ASession%3A%3AcreateTextMessage%28%29-function-usage-question-with-ISO-8859-1-strings-tp24747592p25242884.html
Sent from the ActiveMQ - Dev mailing list archive at Nabble.com.


RE: ActiveMQcpp cms::Session::createTextMessage() function usage question with ISO-8859-1 strings

Posted by Timothy Bish <ta...@gmail.com>.
On Mon, 2009-08-31 at 08:52 -0700, rainy3 wrote:
> Hello,
> 
> I just need more clarification on this case. Assume that i have a ActiveMQ
> broker, java producer and c++ consumer for text messages. Producer read some
> text from database for example in ISO-8859-1 charset. Then it creates text
> message (conversion to UTF-8) and send it to broker. Then, finally, consumer
> won't be able to read this text messages because backward conversion (C++
> client) allows only ASCII chars, right?

When the String value in the TextMessage (Java Strings are in UTF-16) is
written on the wire its converted to a modified UTF-8 representation.
The C++ expects all the string values it reads from the Broker to be in
this representation, so it knows how to read and convert the UTF-8
values back to single byte char values.  The CMS API doesn't currently
have methods for dealing with multibyte C++ strings so strings are
expected to be encoded with values from 0-255.   For ISO-8859-1 that
should be ok since as I recall that defines values from 0-191 so your
strings should be read fine.

Of course the C++ string you get out is going to be composed of a char*
array which is a signed byte, but you can always cast it to an unsigned
char* and use the full range 0-255 as you please.

Does that help?

Regards
Tim.


-- 
Tim Bish
http://fusesource.com
http://timbish.blogspot.com/




RE: ActiveMQcpp cms::Session::createTextMessage() function usage question with ISO-8859-1 strings

Posted by rainy3 <ra...@poczta.onet.pl>.
Hello,

I just need more clarification on this case. Assume that i have a ActiveMQ
broker, java producer and c++ consumer for text messages. Producer read some
text from database for example in ISO-8859-1 charset. Then it creates text
message (conversion to UTF-8) and send it to broker. Then, finally, consumer
won't be able to read this text messages because backward conversion (C++
client) allows only ASCII chars, right?
-- 
View this message in context: http://www.nabble.com/ActiveMQcpp-cms%3A%3ASession%3A%3AcreateTextMessage%28%29-function-usage-question-with-ISO-8859-1-strings-tp24747592p25225751.html
Sent from the ActiveMQ - Dev mailing list archive at Nabble.com.


RE: ActiveMQcpp cms::Session::createTextMessage() function usage question with ISO-8859-1 strings

Posted by Timothy Bish <ta...@gmail.com>.
On Thu, 2009-07-30 at 18:21 -0400, Olivier Langlois wrote:
> Tim,
> 
> Would adding a cms::Session::createUTF8TextMessage( const std::string & ) function to the CMS API be something acceptable for the project maintainers?
> 
> > > Is there a way to tell the API to skip the UTF-8 conversion?
> > >
> > > Greetings,
> > > Olivier
> > >
> > 

My suggestion would be to create a new Jira Issue to request the sort of
functionality that you need.  However I don't think that it would be
implemented as a separate method in the near future as the CMS API is
sealed until the next major version release which isn't currently
planned.  Another thing to consider is that there are variations of
UTF-8 encoding so providing the method as proposed is tricky since its
hard to check that the user has supplied a string in "Modified UTF-8"
which is the format used by Java and other flavors of UTF-8 would not
marshal correctly as they are incompatible with UTF-8 proper.

See: http://en.wikipedia.org/wiki/UTF-8#UTF-8_derivations

The quickest solution to the problem you are facing would be to send the
message as a BytesMessage on the CPP client side, with the payload being
your UTF-8 encoded string and then employ the Message Transformation
mechanism of ActiveMQ to turn those message back into TextMessage
instances on the Java client side, you could use a message property to
flag the messages as needing transformation or you could use a dedicated
Topic or Queue to send them to and transform all BytesMessage's to
TextMessage's.  Be aware that if your data isn't encoded as Modified
UTF-8 you would need to account for that in the Transformation.

see: http://activemq.apache.org/message-transformation.html

Regards
Tim.




-- 
Tim Bish
http://fusesource.com
http://timbish.blogspot.com/




RE: ActiveMQcpp cms::Session::createTextMessage() function usage question with ISO-8859-1 strings

Posted by Olivier Langlois <Ol...@streamtheworld.com>.
Tim,

Would adding a cms::Session::createUTF8TextMessage( const std::string & ) function to the CMS API be something acceptable for the project maintainers?

> > Is there a way to tell the API to skip the UTF-8 conversion?
> >
> > Greetings,
> > Olivier
> >
> 
> Your only option at the moment is to send them as a payload in a
> BytesMessage, the CMS API doesn't define any methods that support
> Unicode strings currently.
> 
> Regards
> Tim.
> 
> --
> Tim Bish
> http://fusesource.com
> http://timbish.blogspot.com/
> 
> 


RE: ActiveMQcpp cms::Session::createTextMessage() function usage question with ISO-8859-1 strings

Posted by Timothy Bish <ta...@gmail.com>.
On Thu, 2009-07-30 at 18:02 -0400, Olivier Langlois wrote:
> Tim,
> 
> Thank you for your answer. Howerver that brings another question :-). This is not something I need but what someone having UTF-8 strings (containing japanese text coming from a db for instance and stored into C++ std::string objects) could do to send his messages as is to the broker with activemqcpp?
> 
> Is there a way to tell the API to skip the UTF-8 conversion?
> 
> Greetings,
> Olivier
> 

Your only option at the moment is to send them as a payload in a
BytesMessage, the CMS API doesn't define any methods that support
Unicode strings currently.

Regards
Tim.

-- 
Tim Bish
http://fusesource.com
http://timbish.blogspot.com/




RE: ActiveMQcpp cms::Session::createTextMessage() function usage question with ISO-8859-1 strings

Posted by Olivier Langlois <Ol...@streamtheworld.com>.
Tim,

Thank you for your answer. Howerver that brings another question :-). This is not something I need but what someone having UTF-8 strings (containing japanese text coming from a db for instance and stored into C++ std::string objects) could do to send his messages as is to the broker with activemqcpp?

Is there a way to tell the API to skip the UTF-8 conversion?

Greetings,
Olivier

> > Is the createTextMessage() function expect UTF-8 strings or regular ASCII
> strings are fine?
> >
> A methods in the CMS API that have string parameters expect standard C++
> strings that have been populated with ASCII values in the range 0-255.
> The UTF-8 conversion happens when the message is sent to the ActiveMQ
> broker which is a Java application whose strings are encoded in modified
> UTF-8.
> 
> Regards
> Tim.
> 
> 
> 
> 
> --
> Tim Bish
> http://fusesource.com
> http://timbish.blogspot.com/
> 
> 


RE: ActiveMQcpp cms::Session::createTextMessage() function usage question with ISO-8859-1 strings

Posted by Timothy Bish <ta...@gmail.com>.
On Thu, 2009-07-30 at 17:42 -0400, Olivier Langlois wrote:
> Hi Tim,
> 
> It is an old version 2.1.3. Are you refering to:
> 
> http://issues.apache.org/activemq/browse/AMQCPP-235 ?
> 
> but even with or without the bug, I still have my question of whether or not I am using ActiveMQcpp API correctly.
> 
> Is the createTextMessage() function expect UTF-8 strings or regular ASCII strings are fine?
> 
A methods in the CMS API that have string parameters expect standard C++
strings that have been populated with ASCII values in the range 0-255.
The UTF-8 conversion happens when the message is sent to the ActiveMQ
broker which is a Java application whose strings are encoded in modified
UTF-8.

Regards
Tim.




-- 
Tim Bish
http://fusesource.com
http://timbish.blogspot.com/




RE: ActiveMQcpp cms::Session::createTextMessage() function usage question with ISO-8859-1 strings

Posted by Olivier Langlois <Ol...@streamtheworld.com>.
Hi Tim,

It is an old version 2.1.3. Are you refering to:

http://issues.apache.org/activemq/browse/AMQCPP-235 ?

but even with or without the bug, I still have my question of whether or not I am using ActiveMQcpp API correctly.

Is the createTextMessage() function expect UTF-8 strings or regular ASCII strings are fine?

Thank you very much!

> 
> Without knowing what version of ActiveMQ-CPP I can't give you a definite
> answer there were issues in the 2.x versions with ASCII values greater
> than 127, those have been addressed and should work either in 2.2.6 or
> 3.0.1.  If you continue to have problems with the latest version of the
> client then I'd recommend opening a new Jira issue and attaching a
> simple but complete test case that demonstrates the issue.
> 
> Regards
> Tim.
> 
> --
> Tim Bish
> http://fusesource.com
> http://timbish.blogspot.com/
> 
> 


Re: ActiveMQcpp cms::Session::createTextMessage() function usage question with ISO-8859-1 strings

Posted by Timothy Bish <ta...@gmail.com>.
On Thu, 2009-07-30 at 17:24 -0400, Olivier Langlois wrote:
> Hi,
> 
> An exception is thrown by the java receiving side:
> 
> Caused by: java.io.UTFDataFormatException
>         at org.apache.activemq.util.MarshallingSupport.readUTF8(MarshallingSupport.java:366)
>         at org.apache.activemq.command.ActiveMQTextMessage.getText(ActiveMQTextMessage.java:86)
>         ... 17 more
> 
> The producer of this message is using ActiveMQcpp and the string passed to createTextMessage() is a string using the ISO-8859-1 charset and it is not encoded in UTF-8. That string can contain characters with a value higher than 127 and may result into an invalid UTF-8 string.
> 
> I have checked the activemqcpp HTML doxygen doc for this function to seek an answer but the function description is omitting to specify the requirements on the function input parameter string requirements. Is the function expect the string to be UTF-8?

Without knowing what version of ActiveMQ-CPP I can't give you a definite
answer there were issues in the 2.x versions with ASCII values greater
than 127, those have been addressed and should work either in 2.2.6 or
3.0.1.  If you continue to have problems with the latest version of the
client then I'd recommend opening a new Jira issue and attaching a
simple but complete test case that demonstrates the issue.

Regards
Tim.

-- 
Tim Bish
http://fusesource.com
http://timbish.blogspot.com/