You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Rinat <r....@cleverdata.ru> on 2018/10/04 17:14:09 UTC

[deserialization schema] skip data, that couldn't be properly deserialized

Hi mates, in accordance with the contract of org.apache.flink.formats.avro.DeserializationSchema, it should return null value, when content couldn’t be deserialized.
But in most cases (for example org.apache.flink.formats.avro.AvroDeserializationSchema) method fails if data is corrupted. 

We’ve implemented our own SerDe class, that returns null, if data doesn’t satisfy avro schema, but it’s rather hard to maintain this functionality during migration to the latest Flink version.
What do you think, maybe it’ll be useful if we will support optional skip of failed records in avro and other Deserializers in the source code ?

Sincerely yours,
Rinat Sharipov
Software Engineer at 1DMP CORE Team

email: r.sharipov@cleverdata.ru <ma...@cleverdata.ru>
mobile: +7 (925) 416-37-26

CleverDATA
make your data clever


Re: [deserialization schema] skip data, that couldn't be properly deserialized

Posted by Rinat <r....@cleverdata.ru>.
Hi Fabian, I have created the issue, https://issues.apache.org/jira/browse/FLINK-10525

Thx !

> On 10 Oct 2018, at 16:47, Fabian Hueske <fh...@gmail.com> wrote:
> 
> Hi Rinat,
> 
> Thanks for discussing this idea. Yes, I think this would be a good feature. 
> Can you open a Jira issue and describe the feature?
> 
> Thanks, Fabian
> 
> Am Do., 4. Okt. 2018 um 19:28 Uhr schrieb Rinat <r.sharipov@cleverdata.ru <ma...@cleverdata.ru>>:
> Hi mates, in accordance with the contract of org.apache.flink.formats.avro.DeserializationSchema, it should return null value, when content couldn’t be deserialized.
> But in most cases (for example org.apache.flink.formats.avro.AvroDeserializationSchema) method fails if data is corrupted. 
> 
> We’ve implemented our own SerDe class, that returns null, if data doesn’t satisfy avro schema, but it’s rather hard to maintain this functionality during migration to the latest Flink version.
> What do you think, maybe it’ll be useful if we will support optional skip of failed records in avro and other Deserializers in the source code ?
> 
> Sincerely yours,
> Rinat Sharipov
> Software Engineer at 1DMP CORE Team
> 
> email: r.sharipov@cleverdata.ru <ma...@cleverdata.ru>
> mobile: +7 (925) 416-37-26
> 
> CleverDATA
> make your data clever
> 

Sincerely yours,
Rinat Sharipov
Software Engineer at 1DMP CORE Team

email: r.sharipov@cleverdata.ru <ma...@cleverdata.ru>
mobile: +7 (925) 416-37-26

CleverDATA
make your data clever


Re: [deserialization schema] skip data, that couldn't be properly deserialized

Posted by Fabian Hueske <fh...@gmail.com>.
Hi Rinat,

Thanks for discussing this idea. Yes, I think this would be a good feature.
Can you open a Jira issue and describe the feature?

Thanks, Fabian

Am Do., 4. Okt. 2018 um 19:28 Uhr schrieb Rinat <r....@cleverdata.ru>:

> Hi mates, in accordance with the contract of
> org.apache.flink.formats.avro.DeserializationSchema, it should return *null
> *value, when content couldn’t be deserialized.
> But in most cases (for example org.apache.flink.formats.avro.
> AvroDeserializationSchema) method fails if data is corrupted.
>
> We’ve implemented our own SerDe class, that returns null, if data doesn’t
> satisfy avro schema, but it’s rather hard to maintain this functionality
> during migration to the latest Flink version.
> What do you think, maybe it’ll be useful if we will support optional skip
> of failed records in avro and other Deserializers in the source code ?
>
> Sincerely yours,
> *Rinat Sharipov*
> Software Engineer at 1DMP CORE Team
>
> email: r.sharipov@cleverdata.ru <a....@cleverdata.ru>
> mobile: +7 (925) 416-37-26
>
> CleverDATA
> make your data clever
>
>