You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2022/03/25 22:46:24 UTC

[GitHub] [pinot] Jackie-Jiang commented on pull request #8413: Allow transform expressions on same column as source and sink.

Jackie-Jiang commented on pull request #8413:
URL: https://github.com/apache/pinot/pull/8413#issuecomment-1079496589


   > The problem that we are running into is that for GDPR etc., we need to be able to purge records from a segment based on values of a particular field and if we change the name of the field that is being ingested into Pinot, then we loose information that column 'x' in the Pinot table actually came from field 'y' in the Kafka event / avro schema and hence cannot purge records automatically based on orginal avro schema field name 'y' in minion.
   
   If you transform the value within column 'x', even if you can find the column, the value is no longer the original value, how do you apply the purge logic?
   Also, even if you can modify the value properly, when generating the new segment, it will use the record transformer to process the records again, which will cause the transform twice problem. If the record transform step is skipped, then there is no guarantee that the value type is correct.
   Making all transforms idempotent can make it much more robust.
   
   > Definitely open to suggestions and discussion, but my understanding is that ingestion transform functions are applied only during ingestion where the original field is in kafka/avro and the transformed value goes into Pinot column, so this should be safe right? If you have any particular usecase that may not be safe I can try them out?
   
   Ingestion transforms can be used during ingestion and also during reload to generate the derived column. Also, on the minion side, the segment can be read as source file, and fed into the ingestion engine again, which may transform the records again. We have to take extra care to make it right if the transform is not idempotent.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org