You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tuscany.apache.org by Simon Laws <si...@googlemail.com> on 2010/11/17 16:54:38 UTC

Multi-byte character support?

Anyone know if there is any support for or any Tuscany tests for
multi-byte character set support in any of the bindings/databindings?

Simon

-- 
Apache Tuscany committer: tuscany.apache.org
Co-author of a book about Tuscany and SCA: tuscanyinaction.com

Re: Multi-byte character support?

Posted by Simon Laws <si...@googlemail.com>.
On Wed, Nov 17, 2010 at 5:58 PM, Simon Laws <si...@googlemail.com> wrote:
> On Wed, Nov 17, 2010 at 5:31 PM, Raymond Feng <cy...@gmail.com> wrote:
>> Hi,
>> Java Strings are unicode encoded. The tricks are when we create Strings from
>> byte[] and vice versa (sometimes through streaming APIs). We need to make
>> sure we use the correct encoding such as UTF-8 instead of the default one
>> which is platform dependent.
>> Thanks,
>> Raymond
>> ________________________________________________________________
>> Raymond Feng
>> rfeng@apache.org
>> Apache Tuscany PMC member and committer: tuscany.apache.org
>> Co-author of Tuscany SCA In Action book: www.tuscanyinaction.com
>> Personal Web Site: www.enjoyjava.com
>> ________________________________________________________________
>> On Nov 17, 2010, at 7:54 AM, Simon Laws wrote:
>>
>> Anyone know if there is any support for or any Tuscany tests for
>> multi-byte character set support in any of the bindings/databindings?
>>
>> Simon
>>
>> --
>> Apache Tuscany committer: tuscany.apache.org
>> Co-author of a book about Tuscany and SCA: tuscanyinaction.com
>>
>>
> Right, there is some questionable code in some places. E.g.
>
> public class String2OMElement extends BaseTransformer<String,
> OMElement> implements
>    PullTransformer<String, OMElement> {
>
>    @SuppressWarnings("unchecked")
>    public OMElement transform(String source, TransformationContext context) {
>        try {
>            StAXOMBuilder builder = new StAXOMBuilder(new
> ByteArrayInputStream(source.getBytes()));
>            OMElement element = builder.getDocumentElement();
>            AxiomHelper.adjustElementName(context, element);
>            return element;
>        } catch (Exception e) {
>            throw new TransformationException(e);
>        }
>    }
>
> Where it does a source.getBytes() with no encoding. I'm assuming that
> we don't test with various encodings to find any issues. But wanted to
> check.
>
> Simon
>
>
> --
> Apache Tuscany committer: tuscany.apache.org
> Co-author of a book about Tuscany and SCA: tuscanyinaction.com
>

I raised TUSCANY-3790 to track

Simon


-- 
Apache Tuscany committer: tuscany.apache.org
Co-author of a book about Tuscany and SCA: tuscanyinaction.com

Re: Multi-byte character support?

Posted by Simon Laws <si...@googlemail.com>.
On Wed, Nov 17, 2010 at 5:31 PM, Raymond Feng <cy...@gmail.com> wrote:
> Hi,
> Java Strings are unicode encoded. The tricks are when we create Strings from
> byte[] and vice versa (sometimes through streaming APIs). We need to make
> sure we use the correct encoding such as UTF-8 instead of the default one
> which is platform dependent.
> Thanks,
> Raymond
> ________________________________________________________________
> Raymond Feng
> rfeng@apache.org
> Apache Tuscany PMC member and committer: tuscany.apache.org
> Co-author of Tuscany SCA In Action book: www.tuscanyinaction.com
> Personal Web Site: www.enjoyjava.com
> ________________________________________________________________
> On Nov 17, 2010, at 7:54 AM, Simon Laws wrote:
>
> Anyone know if there is any support for or any Tuscany tests for
> multi-byte character set support in any of the bindings/databindings?
>
> Simon
>
> --
> Apache Tuscany committer: tuscany.apache.org
> Co-author of a book about Tuscany and SCA: tuscanyinaction.com
>
>
Right, there is some questionable code in some places. E.g.

public class String2OMElement extends BaseTransformer<String,
OMElement> implements
    PullTransformer<String, OMElement> {

    @SuppressWarnings("unchecked")
    public OMElement transform(String source, TransformationContext context) {
        try {
            StAXOMBuilder builder = new StAXOMBuilder(new
ByteArrayInputStream(source.getBytes()));
            OMElement element = builder.getDocumentElement();
            AxiomHelper.adjustElementName(context, element);
            return element;
        } catch (Exception e) {
            throw new TransformationException(e);
        }
    }

Where it does a source.getBytes() with no encoding. I'm assuming that
we don't test with various encodings to find any issues. But wanted to
check.

Simon


-- 
Apache Tuscany committer: tuscany.apache.org
Co-author of a book about Tuscany and SCA: tuscanyinaction.com

Re: Multi-byte character support?

Posted by Raymond Feng <cy...@gmail.com>.
Hi,

Java Strings are unicode encoded. The tricks are when we create Strings from byte[] and vice versa (sometimes through streaming APIs). We need to make sure we use the correct encoding such as UTF-8 instead of the default one which is platform dependent.

Thanks,
Raymond 
________________________________________________________________ 
Raymond Feng
rfeng@apache.org
Apache Tuscany PMC member and committer: tuscany.apache.org
Co-author of Tuscany SCA In Action book: www.tuscanyinaction.com
Personal Web Site: www.enjoyjava.com
________________________________________________________________

On Nov 17, 2010, at 7:54 AM, Simon Laws wrote:

> Anyone know if there is any support for or any Tuscany tests for
> multi-byte character set support in any of the bindings/databindings?
> 
> Simon
> 
> -- 
> Apache Tuscany committer: tuscany.apache.org
> Co-author of a book about Tuscany and SCA: tuscanyinaction.com