You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@avro.apache.org by nick lawson <ni...@hotmail.co.uk> on 2017/03/17 09:06:01 UTC

avro-tools not serialising multibyte chars today

In the past I have used the avro-tools jar "fromjson" to convert a json file
containing utf-8 multibyte chars to avro as expected.  This data is type
"bytes" in the schema. 

Today this isn't working for me  - instead the multibyte characters are each
represented in my avro output as a single ? (questionmark).

No doubt this is due to me changing something in myenvironment. Does anyone
know what I need to set/download to get back to normal running?

Thanks,

Nick

 



--
View this message in context: http://apache-avro.679487.n3.nabble.com/avro-tools-not-serialising-multibyte-chars-today-tp4037037.html
Sent from the Avro - Users mailing list archive at Nabble.com.

Re: avro-tools not serialising multibyte chars today

Posted by Yibing Shi <ys...@cloudera.com>.
Have you set up the locale properly? What is the output of your "locale"
and "locale -a" command?

*Yibing Shi*
*Customer Operations Engineer*
<http://www.cloudera.com>

On Tue, Mar 28, 2017 at 6:11 PM, nick lawson <ni...@hotmail.co.uk>
wrote:

> Doug Cutting wrote
> > Maybe your JVM's default charset has changed?  Try
> -Dfile.encoding="UTF-8"
> > when you start Java.
>
> Doug,
> Thanks, but no it wasn't that.
>
> The effect I'm seeing is the same sort of thing as if I had been trying to
> display characters without a font that would render them (except that I'm
> not trying to display them! )
>
> I'll keep digging.
>
> Thanks,
>
> Nick
>
>
>
> --
> View this message in context: http://apache-avro.679487.n3.
> nabble.com/avro-tools-not-serialising-multibyte-chars-
> today-tp4037037p4037103.html
> Sent from the Avro - Users mailing list archive at Nabble.com.
>

Re: avro-tools not serialising multibyte chars today

Posted by nick lawson <ni...@hotmail.co.uk>.
Doug Cutting wrote
> Maybe your JVM's default charset has changed?  Try -Dfile.encoding="UTF-8"
> when you start Java.

Doug,
Thanks, but no it wasn't that.

The effect I'm seeing is the same sort of thing as if I had been trying to
display characters without a font that would render them (except that I'm
not trying to display them! )

I'll keep digging.

Thanks,

Nick



--
View this message in context: http://apache-avro.679487.n3.nabble.com/avro-tools-not-serialising-multibyte-chars-today-tp4037037p4037103.html
Sent from the Avro - Users mailing list archive at Nabble.com.

Re: avro-tools not serialising multibyte chars today

Posted by Doug Cutting <cu...@apache.org>.
Maybe your JVM's default charset has changed?  Try -Dfile.encoding="UTF-8"
when you start Java.

Even if that fixes things, it's perhaps still a bug.  The tool should
probably not depend on the default charset, but should explicitly set its
expected input encoding.  So, if that's the problem, please file an issue.

Doug

On Mar 17, 2017 2:06 AM, "nick lawson" <ni...@hotmail.co.uk> wrote:

> In the past I have used the avro-tools jar "fromjson" to convert a json
> file
> containing utf-8 multibyte chars to avro as expected.  This data is type
> "bytes" in the schema.
>
> Today this isn't working for me  - instead the multibyte characters are
> each
> represented in my avro output as a single ? (questionmark).
>
> No doubt this is due to me changing something in myenvironment. Does anyone
> know what I need to set/download to get back to normal running?
>
> Thanks,
>
> Nick
>
>
>
>
>
> --
> View this message in context: http://apache-avro.679487.n3.
> nabble.com/avro-tools-not-serialising-multibyte-chars-today-tp4037037.html
> Sent from the Avro - Users mailing list archive at Nabble.com.
>