You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Lasse Nedergaard <la...@gmail.com> on 2020/07/16 11:28:28 UTC

Byte arrays in Avro

Hi.

We have some Avro objects and some of them contain the primitive data
type bytes and it's translated into java.nio.ByteBuffer in the Avro
objects. When using our Avro object we get these warnings:

org.apache.flink.api.java.typeutils.TypeExtractor [] - class java.nio.
ByteBuffer does not contain a getter for field hb
org.apache.flink.api.java.typeutils.TypeExtractor [] - class java.nio.
ByteBuffer does not contain a setter for field hb
org.apache.flink.api.java.typeutils.TypeExtractor [] - Class class java.nio.
ByteBuffer cannot be used as a POJO type because not all fields are valid
POJO fields, and must be processed as GenericType. Please read the Flink
documentation on "Data Types & Serialization" for details of the effect on
performance.

and it's correct that ByteBuffer doesn't contain getter and setter for hb.

Flink documentation said "Note that Flink is automatically serializing
POJOs generated by Avro with the Avro serializer.", but when I debug it
looks like it fails back to generic type for the byte buffer and it
therefore make sense with the warnings.

I want to ensure we are running as effectively as possible.

So my questions are:
1. What is the most optimal way to transport byte arrays in Avro in Flink.
2. Do Flink use Avro serializer for our Avro object when they contain
ByteBuffer?


Thanks

Lasse Nedergaard

Re: Byte arrays in Avro

Posted by Timo Walther <tw...@apache.org>.
I further investigated this issue. We are analyzing the class as a POJO 
in another step here which produces the warning:

https://github.com/apache/flink/blob/master/flink-formats/flink-avro/src/main/java/org/apache/flink/formats/avro/typeutils/AvroTypeInfo.java#L71

However, the serializer is definitely the `AvroSerializer` if the type 
information is `AvroTypeInfo`. You can check that via `dataStream.getType`.

I hope this helps.

Regards,
Timo

On 16.07.20 14:28, Timo Walther wrote:
> Hi Lasse,
> 
> are you using Avro specific records? A look into the code shows that the 
> warnings in the log are generated after the Avro check:
> 
> https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/api/java/typeutils/TypeExtractor.java#L1741 
> 
> 
> Somehow your Avro object is not recognized correctly?
> 
> Regards,
> Timo
> 
> On 16.07.20 13:28, Lasse Nedergaard wrote:
>> Hi.
>>
>> We have some Avro objects and some of them contain the primitive data 
>> type bytes and it's translated into java.nio.ByteBuffer in the Avro 
>> objects. When using our Avro object we get these warnings:
>>
>> org.apache.flink.api.java.typeutils.TypeExtractor [] - class 
>> java.nio.ByteBuffer does not contain a getter for field hb
>> org.apache.flink.api.java.typeutils.TypeExtractor [] - class 
>> java.nio.ByteBuffer does not contain a setter for field hb
>> org.apache.flink.api.java.typeutils.TypeExtractor [] - Class class 
>> java.nio.ByteBuffer cannot be used as a POJO type because not all 
>> fields are valid POJO fields, and must be processed as GenericType. 
>> Please read the Flink documentation on "Data Types & Serialization" 
>> for details of the effect on performance.
>>
>> and it's correct that ByteBuffer doesn't contain getter and setter for 
>> hb.
>>
>> Flink documentation said "Note that Flink is automatically serializing 
>> POJOs generated by Avro with the Avro serializer.", but when I debug 
>> it looks like it fails back to generic type for the byte buffer and it 
>> therefore make sense with the warnings.
>>
>> I want to ensure we are running as effectively as possible.
>>
>> So my questions are:
>> 1. What is the most optimal way to transport byte arrays in Avro in 
>> Flink.
>> 2. Do Flink use Avro serializer for our Avro object when they contain 
>> ByteBuffer?
>>
>>
>> Thanks
>>
>> Lasse Nedergaard
> 


Re: Byte arrays in Avro

Posted by Timo Walther <tw...@apache.org>.
Hi Lasse,

are you using Avro specific records? A look into the code shows that the 
warnings in the log are generated after the Avro check:

https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/api/java/typeutils/TypeExtractor.java#L1741

Somehow your Avro object is not recognized correctly?

Regards,
Timo

On 16.07.20 13:28, Lasse Nedergaard wrote:
> Hi.
> 
> We have some Avro objects and some of them contain the primitive data 
> type bytes and it's translated into java.nio.ByteBuffer in the Avro 
> objects. When using our Avro object we get these warnings:
> 
> org.apache.flink.api.java.typeutils.TypeExtractor [] - class 
> java.nio.ByteBuffer does not contain a getter for field hb
> org.apache.flink.api.java.typeutils.TypeExtractor [] - class 
> java.nio.ByteBuffer does not contain a setter for field hb
> org.apache.flink.api.java.typeutils.TypeExtractor [] - Class class 
> java.nio.ByteBuffer cannot be used as a POJO type because not all fields 
> are valid POJO fields, and must be processed as GenericType. Please read 
> the Flink documentation on "Data Types & Serialization" for details of 
> the effect on performance.
> 
> and it's correct that ByteBuffer doesn't contain getter and setter for hb.
> 
> Flink documentation said "Note that Flink is automatically serializing 
> POJOs generated by Avro with the Avro serializer.", but when I debug it 
> looks like it fails back to generic type for the byte buffer and it 
> therefore make sense with the warnings.
> 
> I want to ensure we are running as effectively as possible.
> 
> So my questions are:
> 1. What is the most optimal way to transport byte arrays in Avro in Flink.
> 2. Do Flink use Avro serializer for our Avro object when they contain 
> ByteBuffer?
> 
> 
> Thanks
> 
> Lasse Nedergaard