You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by "ege-st (via GitHub)" <gi...@apache.org> on 2023/06/13 20:00:57 UTC

[GitHub] [pinot] ege-st opened a new issue, #10907: Enabling Null Support after table creation results in incorrect results for `IS [NOT] NULL` expressions

ege-st opened a new issue, #10907:
URL: https://github.com/apache/pinot/issues/10907

   ## Issue
   Enabling NULL support on a table _after_ the table has data will result in incorrect query results
   for NULL expressions (i.e., `IS NULL` and `IS NOT NULL`). The Null Value Vector bitmap is used
   by Pinot to identify which values in a column are `null`, when Null Value support is enabled
   Pinot generates a Null Value Vector bitmap for each segent and this segment is used to evaluate
   NULL expressions.  If a Null Value Vector is present then Pinot will return any value which is
   `null` for `IS NULL` and it will return any value which is not null for `IS NOT NULL`; however,
   if there is no Null Value Bitmap in a segment, then Pinot will return no values for `IS NULL`
   and it will return every value in the segment for `IS NOT NULL`. Critically, as Pinot currently
   works, a Null Value Vector bitmap is only created at the time a segment is first created and cannot
   be added afterwards.
   
   The issue arises when a table is created without Null support and then, after data is ingested,
   Null support is enabled. New segments will have a Null Value Vector bitmap, and the older segments
   will not. If a user then issues a `IS NULL` or `IS NOT NULL` query, the older segments will execute
   the query as if Null Support is disabled, and the new segments will the query as if Null Support is
   enabled. The results from each of these segments is then merged together, as if they all evaluated the
   expression the same way, and that is returned to the user as the result. This will result in the user getting
   many false positives in their result set.
   
   ## Suggestion
   There are a few issues here, from the user perspective:
   
   1. A user can run queries using `IS NULL` and `IS NOT NULL` even if a table does not support null
   values.
   2. If Null Value support is added to a table after it already has data, then some segments will return
   incorrect results for a Null expression query.
   
   Some things we should consider doing are:
   
   1. Not allow `IS NULL` or `IS NOT NULL` to be used if a table does not have null support enabled.
   2. If a user wants to enable null support on a table after it was created, we should not allow it,
   unless the user uses an override command and the user explicitly states what the null value for the
   columns should be.
   3. If the user explicitly provides the definition for what the Null value of a column is, then Minion
   can "reindex" a segment and construct the Null Value Vector bitmap.
   4. Queries that use Null expressions will return an error until every segment has been reindexed by
   Minion.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] shenyu0127 commented on issue #10907: Enabling Null Support after table creation results in incorrect results for `IS [NOT] NULL` expressions

Posted by "shenyu0127 (via GitHub)" <gi...@apache.org>.
shenyu0127 commented on issue #10907:
URL: https://github.com/apache/pinot/issues/10907#issuecomment-1592026348

   Ideally, we want a fail fast behavior for a NULL support enabled query over NULL support disabled segments.
   
   In reality, we don't fail fast: if a query is mis-configured, the behavior is undefined.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on issue #10907: Enabling Null Support after table creation results in incorrect results for `IS [NOT] NULL` expressions

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang commented on issue #10907:
URL: https://github.com/apache/pinot/issues/10907#issuecomment-1595571975

   We don't want to enforce user to re-create the table when they want to enable null handling because a lot of users are okay to have old segments not handling null and just the new segments handling null.
   Currently we cannot auto generate null value vector after segment is created because the null value is already replaced by the default value, and the null information is already lost. What we can do is to allow user to config some columns, and pinot just treat the default value as null for the configured columns. Then we can generate the null value vector for these columns.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] ege-st commented on issue #10907: Enabling Null Support after table creation results in incorrect results for `IS [NOT] NULL` expressions

Posted by "ege-st (via GitHub)" <gi...@apache.org>.
ege-st commented on issue #10907:
URL: https://github.com/apache/pinot/issues/10907#issuecomment-1613225297

   Ok, this makes sense to me, but had a few more questions to make sure I fully understand. What happens if a user enables Null Support on an existing table but does not configure any columns? If a table has a mixed state where some segments were generating their Null Value Vectors and older segments did not have any configuration for what columns were nullable, then what happens if a user runs a query with `IS [NOT] NULL`?
   
   My primary concern here is that, currently, if someone enables Null Support on an existing table then `IS [NOT] NULL` queries will return invalid results from the segments that don't have Null Value vectors; and I'm not sure that this solution will solve that problem.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on issue #10907: Enabling Null Support after table creation results in incorrect results for `IS [NOT] NULL` expressions

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang commented on issue #10907:
URL: https://github.com/apache/pinot/issues/10907#issuecomment-1612255465

   > Would this solution mean that if a user enables Null Values on an existing table they have to configure what columns will support Nulls and then we treat the default value as the null value?
   
   This only applies to existing segments (whether the table exist of not doesn't matter). For new segments, null value vector can be properly generated based on whether the input value is null.
   
   > What would happen if the user enabled NV support on a table that already exists but they don't configure any columns?
   
   New segments will have null value vector properly generated. No changes will happen for existing segments.
   
   > What would happen to the already existing segments? In these we would just treat the default value as null?
   
   The problem of directly treating the default value as null is that we might mistakenly treat real default value as null. This can be very common for metric columns, where default value is 0. That's the reason why I suggested adding a config for it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] ege-st commented on issue #10907: Enabling Null Support after table creation results in incorrect results for `IS [NOT] NULL` expressions

Posted by "ege-st (via GitHub)" <gi...@apache.org>.
ege-st commented on issue #10907:
URL: https://github.com/apache/pinot/issues/10907#issuecomment-1593143788

   > Ideally, we want a fail fast behavior for a NULL support enabled query over NULL support disabled segments.
   > 
   > In reality, we don't fail fast: if a query is mis-configured, the behavior is undefined.
   
   Perhaps this would work:
   1. If a segment does not have a Null Value Vector bitmap, the server will return an error.
   2. when a broker is gathering responses to a query, if any response is an Error then the broker will invalidate the query and return an Error to the user.
   
   I think, this would prevent the situation where the response is a mix of correct and incorrect results.
   
   However, it does not guarantee that `IS NULL` and `IS NOT NULL` will only successfully evaluate when all segments have a Null Value Vector bitmap.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] kishoreg commented on issue #10907: Enabling Null Support after table creation results in incorrect results for `IS [NOT] NULL` expressions

Posted by "kishoreg (via GitHub)" <gi...@apache.org>.
kishoreg commented on issue #10907:
URL: https://github.com/apache/pinot/issues/10907#issuecomment-1613253545

   There isn't much we can to address older data. User has 3 options
   - Be ok with the fact that already ingested data has no NULLs
   - configure a column to use default value as NULL
   - rebootstrap data (this will generate the null bitmap)
   
   My argument here is the behavior for already ingested segment does not change and is a simple invariant to reason about. Throwing errors for already working queries will impact the user experience. If we really want that behavior, we can provide an option to enable strictNullHandling or something like that.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] ege-st commented on issue #10907: Enabling Null Support after table creation results in incorrect results for `IS [NOT] NULL` expressions

Posted by "ege-st (via GitHub)" <gi...@apache.org>.
ege-st commented on issue #10907:
URL: https://github.com/apache/pinot/issues/10907#issuecomment-1609640508

   @Jackie-Jiang I have some questions to help me better understand how this would work:
   
   Would this solution mean that if a user enables Null Values on an existing table they have to configure what columns will support Nulls and then we treat the default value as the null value? 
   
   What would happen if the user enabled NV support on a table that already exists but they don't configure any columns?
   
   What would happen to the already existing segments? In these we would just treat the default value as null?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org