You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2021/02/01 10:58:28 UTC

[GitHub] [pulsar] eolivelli commented on pull request #9343: Issue 9004: Pulsar Schema API: provide Type information for Fields

eolivelli commented on pull request #9343:
URL: https://github.com/apache/pulsar/pull/9343#issuecomment-770768654


   Sorry, I didn't mean that Pulsar IO is a competitor for Kafka Connect, but I would like to share my experience in porting projects that are working on Kafka Connect to Pulsar IO Sinks.
   
   I recently got stuck in this problem, that when you deal with GenericRecord you cannot have metadata about the Schema, you can only access the raw schema bytes or you have to use Java reflection in order to deal with the type of each field (and this also is a problem because for null values you do not know the original datatype).
   
   GenericRecord is a wonderful abstraction over generic data structures and it looks to me that the most missing part of the story is about having this kind of metadata about the structure.
   
   GenericRecord already provides many APIs to create records and create schemas (RecordSchemaBuilder and GenericRecordBuilder) , the only missing part here is to "read" that metadata.
   
   Once we have this API working with structured data with Pulsar IO (and probably with PulsarFunctions in general) will be more easier.
   
   Dealing directly with Avro (and Protobuf...) is not a good way to go for my usecases, because I will have to explicitly write code for every schema type, and also enter the details of every technology. 
   
   For simple cases (but GenericRecord is not that simple, it is already very powerful, as it is already able to deal with nested structs) it is a great feature to be able to code your Pulsar Sink independently from the physical encoding of the records, the same way we do with Compression.
   
   
   cc @shiv4289 @aahmed-se 
   
   
   
   
   
   
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org