You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@avro.apache.org by MCKEAG Tory <to...@alstom.com> on 2016/11/02 22:24:04 UTC

Java / C++ interop and strings?

Hi, I have some data I've serialized with Java and would like to deserialize in a C++ application, using GenericRecord in both languages.  I've discovered that before each string field, the Java serialization is putting out an extra byte with the value of '2' (or the start text Unicode character?  I have no idea why this byte is there).  After that byte, it contains the count of characters in the string and of course the string data itself.

The C++ decoder layer is choking on this extra byte.   When I try to serialize the same data using C++, the extra byte isn't there and everything works.  Also, when I deserialize the data written WITH the extra byte using Java, it seems to work fine (I guess makes sense given the extra byte was added by the Java serializer).

I've been testing using the 1.8.1 official release of both the Java and C++ libraries.  Any idea what's going on?  I've been looking for some kind of config options on the encoders/decoders for either language, didn't find much.  Any specific additional info I should provide?

________________________________
CONFIDENTIALITY : This e-mail and any attachments are confidential and may be privileged. If you are not a named recipient, please notify the sender immediately and do not disclose the contents to another person, use it for any purpose or store or copy the information in any medium.

RE: Java / C++ interop and strings?

Posted by MCKEAG Tory <to...@alstom.com>.
Thanks Devendra, that's helpful.  However, I'm not using any compression, and I still have these extra bytes when serializing from Java.  Any clues?

From: Devendra Tomar [mailto:dtomar2@sapient.com]
Sent: Thursday, November 03, 2016 4:30 AM
To: user@avro.apache.org
Subject: RE: Java / C++ interop and strings?

Hi Tory,

>> I've been testing using the 1.8.1 official release of both the Java and C++ libraries.  Any idea what's going on?  I've been looking for some kind of config options on the encoders/decoders for either language, didn't find
>> much.  Any specific additional info I should provide?

In Java there is a class called CodecFactory that provides different compression codecs, you can find details here is the link:
https://avro.apache.org/docs/1.8.1/api/java/org/apache/avro/file/CodecFactory.html#method_summary<https://urldefense.proofpoint.com/v2/url?u=https-3A__avro.apache.org_docs_1.8.1_api_java_org_apache_avro_file_CodecFactory.html-23method-5Fsummary&d=DQMFAg&c=IV_clAzoPDE253xZdHuilRgztyh_RiV3wUrLrDQYWSI&r=KrzgCoTCVkLR0UvTgzkp8wgH6Y5iH32djW6rC-0fU6E&m=Ft9c8IFVvoZ_z3Wi4-o7NUBZ1IGODnpBQMdFKNDdxAw&s=vB_DyvjFjc5o5OKVNqEz-RlmJ0oOu1TZmQePfgjHE4w&e=>

Here is the use case of this particular class :
https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-kite-bundle/nifi-kite-processors/src/main/java/org/apache/nifi/processors/kite/ConvertJSONToAvro.java#L132<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_nifi_blob_master_nifi-2Dnar-2Dbundles_nifi-2Dkite-2Dbundle_nifi-2Dkite-2Dprocessors_src_main_java_org_apache_nifi_processors_kite_ConvertJSONToAvro.java-23L132&d=DQMFAg&c=IV_clAzoPDE253xZdHuilRgztyh_RiV3wUrLrDQYWSI&r=KrzgCoTCVkLR0UvTgzkp8wgH6Y5iH32djW6rC-0fU6E&m=Ft9c8IFVvoZ_z3Wi4-o7NUBZ1IGODnpBQMdFKNDdxAw&s=CjE53xnUl2R38C_ZQA0Xpw9RsX3Xo-S963lXt-6VcHQ&e=>
https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-kite-bundle/nifi-kite-processors/src/main/java/org/apache/nifi/processors/kite/AbstractKiteConvertProcessor.java#L45<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_nifi_blob_master_nifi-2Dnar-2Dbundles_nifi-2Dkite-2Dbundle_nifi-2Dkite-2Dprocessors_src_main_java_org_apache_nifi_processors_kite_AbstractKiteConvertProcessor.java-23L45&d=DQMFAg&c=IV_clAzoPDE253xZdHuilRgztyh_RiV3wUrLrDQYWSI&r=KrzgCoTCVkLR0UvTgzkp8wgH6Y5iH32djW6rC-0fU6E&m=Ft9c8IFVvoZ_z3Wi4-o7NUBZ1IGODnpBQMdFKNDdxAw&s=xOg8URZV_CyqszM0C55gUi0flKGDXksZcIwYdrw1Vi8&e=>

Hope it helps !

Regards
Devendra Tomar

From: MCKEAG Tory [mailto:tory.mckeag@alstom.com]
Sent: Thursday, November 03, 2016 3:54 AM
To: user@avro.apache.org<ma...@avro.apache.org>
Subject: Java / C++ interop and strings?

Hi, I have some data I've serialized with Java and would like to deserialize in a C++ application, using GenericRecord in both languages.  I've discovered that before each string field, the Java serialization is putting out an extra byte with the value of '2' (or the start text Unicode character?  I have no idea why this byte is there).  After that byte, it contains the count of characters in the string and of course the string data itself.

The C++ decoder layer is choking on this extra byte.   When I try to serialize the same data using C++, the extra byte isn't there and everything works.  Also, when I deserialize the data written WITH the extra byte using Java, it seems to work fine (I guess makes sense given the extra byte was added by the Java serializer).

I've been testing using the 1.8.1 official release of both the Java and C++ libraries.  Any idea what's going on?  I've been looking for some kind of config options on the encoders/decoders for either language, didn't find much.  Any specific additional info I should provide?

________________________________
CONFIDENTIALITY : This e-mail and any attachments are confidential and may be privileged. If you are not a named recipient, please notify the sender immediately and do not disclose the contents to another person, use it for any purpose or store or copy the information in any medium.

________________________________
CONFIDENTIALITY : This e-mail and any attachments are confidential and may be privileged. If you are not a named recipient, please notify the sender immediately and do not disclose the contents to another person, use it for any purpose or store or copy the information in any medium.

RE: Java / C++ interop and strings?

Posted by Devendra Tomar <dt...@sapient.com>.
Hi Tory,

>> I've been testing using the 1.8.1 official release of both the Java and C++ libraries.  Any idea what's going on?  I've been looking for some kind of config options on the encoders/decoders for either language, didn't find
>> much.  Any specific additional info I should provide?

In Java there is a class called CodecFactory that provides different compression codecs, you can find details here is the link:
https://avro.apache.org/docs/1.8.1/api/java/org/apache/avro/file/CodecFactory.html#method_summary

Here is the use case of this particular class :
https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-kite-bundle/nifi-kite-processors/src/main/java/org/apache/nifi/processors/kite/ConvertJSONToAvro.java#L132
https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-kite-bundle/nifi-kite-processors/src/main/java/org/apache/nifi/processors/kite/AbstractKiteConvertProcessor.java#L45

Hope it helps !

Regards
Devendra Tomar

From: MCKEAG Tory [mailto:tory.mckeag@alstom.com]
Sent: Thursday, November 03, 2016 3:54 AM
To: user@avro.apache.org
Subject: Java / C++ interop and strings?

Hi, I have some data I've serialized with Java and would like to deserialize in a C++ application, using GenericRecord in both languages.  I've discovered that before each string field, the Java serialization is putting out an extra byte with the value of '2' (or the start text Unicode character?  I have no idea why this byte is there).  After that byte, it contains the count of characters in the string and of course the string data itself.

The C++ decoder layer is choking on this extra byte.   When I try to serialize the same data using C++, the extra byte isn't there and everything works.  Also, when I deserialize the data written WITH the extra byte using Java, it seems to work fine (I guess makes sense given the extra byte was added by the Java serializer).

I've been testing using the 1.8.1 official release of both the Java and C++ libraries.  Any idea what's going on?  I've been looking for some kind of config options on the encoders/decoders for either language, didn't find much.  Any specific additional info I should provide?

________________________________
CONFIDENTIALITY : This e-mail and any attachments are confidential and may be privileged. If you are not a named recipient, please notify the sender immediately and do not disclose the contents to another person, use it for any purpose or store or copy the information in any medium.