You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2022/03/25 22:29:35 UTC

[GitHub] [pinot] amrishlal commented on pull request #8413: Allow transform expressions on same column as source and sink.

amrishlal commented on pull request #8413:
URL: https://github.com/apache/pinot/pull/8413#issuecomment-1079489100


   > Let's hold a little bit on merging this and have some high level discussion first.
   > 
   > We intentionally reject ingestion transform with same input and output column because it is not idempotent, and can cause unexpected behavior if by any chance the same record is transformed twice. Also, in certain scenarios, the input data might already have the final column generated, and we just skip the transform. I would be super careful on this change because we need to ensure the record is never transformed twice. Another concern is that if the ingestion transform changes, there is no way to re-generate the derived column because the original values are already changed. IMO, loose this restriction can easily cause unexpected behavior, and might not worth it.
   
   The problem that we are running into is that for GDPR etc., we need to be able to purge records from a segment based on values of a particular field and if we change the name of the field that is being ingested into Pinot, then we loose information that column 'x' in the Pinot table actually came from field 'y' in the Kafka event / avro schema and hence cannot purge records automatically based on orginal avro schema field name 'y' in minion.
   
   Definitely open to suggestions and discussion, but my understanding is that ingestion transform functions are applied only during ingestion only where the original field is in kafka/avro and the transformed value goes into Pinot column, so this should be safe right? If you have any particular usecase that may not be safe I can try them out?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org