You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by "shounakmk219 (via GitHub)" <gi...@apache.org> on 2023/05/03 14:15:10 UTC

[GitHub] [pinot] shounakmk219 commented on issue #10675: Failures in RealtimeQuickStart when indexes are changed

shounakmk219 commented on issue #10675:
URL: https://github.com/apache/pinot/issues/10675#issuecomment-1533106608

   I had a look at this, from the description it looks like there are 2 issues
   
   1. One which @Jackie-Jiang pointed out that the exception log is due to the RealtimeToOfflineSegmentsTask task which is unable to find the respective offline table. 
   
   - If you see the `QuickStartBase.DEFAULT_STREAM_TABLE_DIRECTORIES` the `githubEvents` table entry alone is under `examples/minions/stream` while other tables are sitting under `examples/stream`.
   - Even though there is a `examples/stream/githubEvents` but it does not follow the file name convention such as `examples/stream/<TABLE_NAME>/<TABLE_NAME>_realtime_table_config.json` and also has a different schema compared to the current `githubEvents` table.
   - As `examples/minions/stream/githubEvents` is referenced at other places too so removing the task from the table config may not be the right way to handle this
   - We can rename the current  `examples/stream/githubEvents` to `examples/stream/pullRequestMergedEvents` in accordance to the naming convention and create a new `examples/stream/githubEvents` which is a copy of `examples/minions/stream/githubEvents` without the task in table config.
   
   2. The one which @gortiz is pointing to where segments being deleted and index are acting weird upon index update + segment reload.
   
   - The data in `githubEvents` table is from 2021 and the table config has retention set to 1 year so reload is deleting all the segments
   - As there is no data the newly created consuming segment is showing the weird index view
   - Increasing the retention time addresses the issue. Let me know if this makes sense or we should update the data? Both cases will require maintenance in future


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org