You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "nsivabalan (via GitHub)" <gi...@apache.org> on 2023/03/03 03:15:45 UTC

[GitHub] [hudi] nsivabalan commented on issue #8065: [SUPPORT] Deltastreamer AvroKafka Schema Evolution transiently failing in --continuous mode

nsivabalan commented on issue #8065:
URL: https://github.com/apache/hudi/issues/8065#issuecomment-1452897668

   hey @danielfordfc 
   
   I guess we have some hunch on whats going on.
   if you have some time and willing to contribute, let me know.
   
   The issue is. getSourceScheme in case of SchemaRegistry provider is not idempotent. even within a single batch of write, if we call getSourceSchema multiple times, it could return latest schema from the schema registry. ideally we want it to return one schema for one batch of write.
   so, the fix is to add a new api to Source abstract class called "clearCaches" or "cleanupResources". also add similar apis to SchemaProvider. and so within source.clearCaches, we will call schemaProvider.clearCaches.
   Incase of SchemaRegistryProvider, for every batch, we will fetch from remote schema registry and cache is locally. for subsequent calls to getsourceSchema, we will be returning the same value. before moving onto next batch of consume, we will have to call clearCaches which will invalidate the local cache of source schema.
   
   If you are interested, can you take it up. 
   I can help review the patch. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org