You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2021/01/15 21:59:55 UTC

[GitHub] [incubator-pinot] jackjlli commented on pull request #6361: Detect invalid column names from query in Pinot server

jackjlli commented on pull request #6361:
URL: https://github.com/apache/incubator-pinot/pull/6361#issuecomment-761220753


   > It would be good to verify the following before pushing the change:
   > 
   > * Are there any existing prod cases that might break? For example, if a new column was added to schema, but no backfill was performed, so only the newer segments have this the new column. The new behavior will bail out early, this might be different from existing behavior.
   > * Double check any performance impact on high throughput cases.
   
   If a new column was added, the `_tableColumnNamesMap` in `HelixInstanceDataManager` will be updated with the latest schema if there is a segment reload or a server restart. In this case, when the new column gets queried, it can go through and query the new segments. Old segments remained un-queried since it's been pruned.
   
   If there is no segment reload nor server restart, the table is in an inconsistent state, where old segments don't have new column, and new segments do. Without telling pinot-server what should be the correct schema, `_tableColumnNamesMap` remains holding its existing copy of column names. The server holding the old copy will return empty response with a `invalidColumns` flag in the payload. One workaround is similarly to update table config, where a refresh table config message will be sent to brokers. When a table schema gets updated, a refresh schema message will be sent to servers. But I think this is similar to (schema update + auto segment reload).
   
   In terms of the performance impact, I'll benchmark it and update it here.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org