You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2019/03/05 02:13:39 UTC

[GitHub] [pulsar] sijie edited a comment on issue #3741: POJO AvroSchema always allowNull

sijie edited a comment on issue #3741: POJO AvroSchema always allowNull
URL: https://github.com/apache/pulsar/issues/3741#issuecomment-469506795
 
 
   > Given a POJO generated by Avro, there is no way to determine whether this POJO was generated with a schema that allowed or not for null types.
   > but that's separate from the point that that alone won't guarantee we can generate the correct schema starting from the generated POJO.
   
   @merlimat 
   
   The issue I am creating here is for `AllowNull`. We found `AllowNull` is a problem from 2 use cases: 1) the one reported by @skyrocknroll  2) the other use case that @codelipenghui hit.  
   
   The whole picture of @skyrocknroll 's problem is  avro file => avro generated pojo. a schema generated by avro file is not compatible by scheme generated by pulsar.AvroSchema(generated pojo). One of the problems is `AllowNull` completely changes the schema definitions. Whether removing `AllowNull` can address this problem or not is a separate issue to be address. Although I would expect Avro can handle this well. We shouldn't couple the discussion of this issue with a broad issue introduced by `AllowNull`. 
   
   > It's a 100% correct solution for that case. I don't see what's limited about that.
   
   @merlimat 
   
   Generated POJO is the use case reported and discussed at slack channel. If you just handle generated POJO by using `getClassSchema`, you are not covering many other data sources which generate AVRO schema using ReflectData.
   
   > the contract you are creating when generating an avro schema using the ReflectData api is with the java class itself not some other tool or system. 
   
   @jerrypeng 
   
   I am not creating any contract. They are from real use cases. Also the whole discussion is around POJO only, it is not even a cross-language issue or any user customized schema issue. (it was found when being used in cross-language)
   
   1) if a user uses an AVRO schema file to generate a schema using avro tools (that's call this schema A), and generate a POJO class. then the user use the POJO class and use pulsar avro schema to generate another schema B. Ideally A and B should be compatible.
   
   ```
   schema A => generated pojo => (pulsar AvroSchema) => schema B
   
                          ||
                          \/
   
   schema A => generated pojo => ReflectData.AllowNull.parse => schema B
   ```
   
   `AllowNull` is the problem to prevent them being compatible. I don't know if removing `AllowNull` can fully address this problem or not. That's a specific issue to address for Avro generated POJO. I hope Avro provides the right tools to convert back-and-forth between schema and pojos, otherwise IMO it is a problem of Avro.
   
   If we removed `AllowNull`, the flow will be changed to following. The scope of the problem is different - whether `ReflectData` is the right tool to handle Avro generated POJO or does Avro even provide tools to guarantee such conversations.
   
   ```
   schema A => generated pojo => ReflectData.parse => schema B
   ```
   
   2)  Image I have a data system A (e.g. Spark or Flink) and Pulsar. I have a POJO class (e.g. UserProfile) defined across the whole organization. The schema generated in different systems are completely not compatible even they are using same POJO. When data is flowing between pulsar and other systems, the data might not be processed properly due to incompatible schema. Pulsar is the message bus for exchanging data between other system. If it produces an incompatible schema than other systems, IMO that's a very serious bug. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services