You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by "egalpin (via GitHub)" <gi...@apache.org> on 2023/07/05 18:51:10 UTC

[GitHub] [pinot] egalpin commented on pull request #11027: Preserve non-null comparison column values during segment commit

egalpin commented on PR #11027:
URL: https://github.com/apache/pinot/pull/11027#issuecomment-1622295892

   The motivation behind wanting to keep nullness encoded is being able to easily discern between a defaultValue representation of null Vs. an actual null so that comparing 2 defaultValues would not result in a comparison result of `0` therefore triggering upsert.
   
   I don't feel that this is a required "guard" anymore though; it may have been needed at one point but the various algorithms for multiple comparison column upsert have changed a lot throughout implementation.  We guard against any newly ingested record having all null comparison columns[1], so by the time we reach `compareTo` the values being compared would "at worst" be `<defaultValue>` of the previously ingested record and the non-null value (i.e. non-defaultValue) of the newly ingested record. Such a comparison would have the same result as the current implementation (current implementation checks if the previous record's value for the same comparison index is null and if so accepts the new record as a valid upsert). 
   
   All that said though, if we don't encode nullness then we lose the ability to perform valuable queries like "show me all records that have null `comparisonColumnX`" (i.e. "show me all records which have never had data written by a producer that uses `comparisonColumnX`).
   
   [1] https://github.com/egalpin/pinot/blob/68fdfa4a12926c7ce45dadc6a86d973ce5ff3669/pinot-segment-local/src/main/java/org/apache/pinot/segment/local/indexsegment/mutable/MutableSegmentImpl.java#L597 ( -> this method has move, I think as of https://github.com/apache/pinot/pull/10703)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org