You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2022/06/02 16:05:24 UTC

[GitHub] [pinot] navina opened a new issue, #8819: Updating schema with new MV column causes index failures and stops consumption

navina opened a new issue, #8819:
URL: https://github.com/apache/pinot/issues/8819

   One of the users added a new MV column to the schema and reloaded the segments. Fyi, the data in the topic doesn't have any values for this new field. It failed with the following exception:
   ```2022/05/31 19:47:55.833 ERROR [LLRealtimeSegmentDataManager_calls__0__856__20220531T1847Z] [calls__0__856__20220531T1847Z] Could not build segment
   java.lang.ArrayIndexOutOfBoundsException: Index 0 out of bounds for length 0
           at org.apache.pinot.segment.local.segment.index.readers.constant.ConstantMVForwardIndexReader.getDictIdMV(ConstantMVForwardIndexReader.java:48) ~[startree-pinot-all-0.11.0-SNAPSHOT-jar-with-dependencies.jar:0.
   11.0-SNAPSHOT-b92cbb4df88eee0bf0fbb39dfc434387370edc8e]
           at org.apache.pinot.segment.local.segment.readers.PinotSegmentColumnReader.getValue(PinotSegmentColumnReader.java:89) ~[startree-pinot-all-0.11.0-SNAPSHOT-jar-with-dependencies.jar:0.11.0-SNAPSHOT-b92cbb4df88e
   ee0bf0fbb39dfc434387370edc8e]
           at org.apache.pinot.segment.local.segment.readers.PinotSegmentRecordReader.getRecord(PinotSegmentRecordReader.java:226) ~[startree-pinot-all-0.11.0-SNAPSHOT-jar-with-dependencies.jar:0.11.0-SNAPSHOT-b92cbb4df8
   8eee0bf0fbb39dfc434387370edc8e]
           at org.apache.pinot.segment.local.segment.readers.PinotSegmentRecordReader.next(PinotSegmentRecordReader.java:215) ~[startree-pinot-all-0.11.0-SNAPSHOT-jar-with-dependencies.jar:0.11.0-SNAPSHOT-b92cbb4df88eee0
   bf0fbb39dfc434387370edc8e]
           at org.apache.pinot.segment.local.segment.creator.impl.SegmentIndexCreationDriverImpl.build(SegmentIndexCreationDriverImpl.java:218) ~[startree-pinot-all-0.11.0-SNAPSHOT-jar-with-dependencies.jar:0.11.0-SNAPSH
   OT-b92cbb4df88eee0bf0fbb39dfc434387370edc8e]
           at org.apache.pinot.segment.local.realtime.converter.RealtimeSegmentConverter.build(RealtimeSegmentConverter.java:123) ~[startree-pinot-all-0.11.0-SNAPSHOT-jar-with-dependencies.jar:0.11.0-SNAPSHOT-b92cbb4df88
   eee0bf0fbb39dfc434387370edc8e]
           at org.apache.pinot.core.data.manager.realtime.LLRealtimeSegmentDataManager.buildSegmentInternal(LLRealtimeSegmentDataManager.java:839) [startree-pinot-all-0.11.0-SNAPSHOT-jar-with-dependencies.jar:0.11.0-SNAPSHOT-b92cbb4df88eee0bf0fbb39dfc434387370edc8e]
           at org.apache.pinot.core.data.manager.realtime.LLRealtimeSegmentDataManager.buildSegmentForCommit(LLRealtimeSegmentDataManager.java:766) [startree-pinot-all-0.11.0-SNAPSHOT-jar-with-dependencies.jar:0.11.0-SNAPSHOT-b92cbb4df88eee0bf0fbb39dfc434387370edc8e]
           at org.apache.pinot.core.data.manager.realtime.LLRealtimeSegmentDataManager$PartitionConsumer.run(LLRealtimeSegmentDataManager.java:665) [startree-pinot-all-0.11.0-SNAPSHOT-jar-with-dependencies.jar:0.11.0-SNAPSHOT-b92cbb4df88eee0bf0fbb39dfc434387370edc8e]
           at java.lang.Thread.run(Thread.java:829) [?:?]
   ```        
   
   ## Steps to reproduce
   1. Run the `RealtimeQuickstart` 
   2. Load some data
   3. `Edit schema` to add a new MV column
   4. Reload the segments 
   
   ### Things to note
   
   * The quickstart is using High level consumer, where as , the customer was using low level consumer. 
   * When reproducing, the same exception happens. However, this time the stacktrace is from the query path.
   ```
   2022/06/02 11:31:20.041 ERROR [BaseCombineOperator] [pqw-6] Caught exception while processing query: QueryContext{_tableName='meetupRsvp_REALTIME', _subquery=null, _selectExpressions=[event_id, event_name, event_tags, event_time, group_city, group_country, group_id, group_lat, group_lon, group_name, location, mtime, rsvp_count, venue_name], _aliasList=[null, null, null, null, null, null, null, null, null, null, null, null, null, null], _filter=null, _groupByExpressions=null, _havingFilter=null, _orderByExpressions=null, _limit=10, _offset=0, _queryOptions={responseFormat=sql, groupByMode=sql, timeoutMs=10000}, _expressionOverrideHints={}, _explain=false}
   java.lang.ArrayIndexOutOfBoundsException: Index 0 out of bounds for length 0
   	at org.apache.pinot.segment.local.segment.index.readers.constant.ConstantMVForwardIndexReader.getDictIdMV(ConstantMVForwardIndexReader.java:48) ~[classes/:?]
   	at org.apache.pinot.core.common.DataFetcher$ColumnValueReader.readStringValuesMV(DataFetcher.java:701) ~[classes/:?]
   	at org.apache.pinot.core.common.DataFetcher.fetchStringValues(DataFetcher.java:391) ~[classes/:?]
   	at org.apache.pinot.core.common.DataBlockCache.getStringValuesForMVColumn(DataBlockCache.java:462) ~[classes/:?]
   	at org.apache.pinot.core.operator.docvalsets.ProjectionBlockValSet.getStringValuesMV(ProjectionBlockValSet.java:179) ~[classes/:?]
   	at org.apache.pinot.core.common.RowBasedBlockValueFetcher.createFetcher(RowBasedBlockValueFetcher.java:84) ~[classes/:?]
   	at org.apache.pinot.core.common.RowBasedBlockValueFetcher.<init>(RowBasedBlockValueFetcher.java:33) ~[classes/:?]
   	at org.apache.pinot.core.operator.query.SelectionOnlyOperator.getNextBlock(SelectionOnlyOperator.java:97) ~[classes/:?]
   	at org.apache.pinot.core.operator.query.SelectionOnlyOperator.getNextBlock(SelectionOnlyOperator.java:40) ~[classes/:?]
   	at org.apache.pinot.core.operator.BaseOperator.nextBlock(BaseOperator.java:39) ~[classes/:?]
   	at org.apache.pinot.core.operator.combine.BaseCombineOperator.processSegments(BaseCombineOperator.java:158) ~[classes/:?]
   	at org.apache.pinot.core.operator.combine.BaseCombineOperator$1.runJob(BaseCombineOperator.java:101) [classes/:?]
   	at org.apache.pinot.core.util.trace.TraceRunnable.run(TraceRunnable.java:40) [classes/:?]
   	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
   	at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
   	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
   	at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:111) [guava-20.0.jar:?]
   	at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:58) [guava-20.0.jar:?]
   	at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:75) [guava-20.0.jar:?]
   	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
   	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
   	at java.lang.Thread.run(Thread.java:829) [?:?]
   ```
   	
   * Another exception seen, tangential to this issue is a `NumberFormatException` in `SegmentsValidationAndRetentionConfig` and `SegmentStatusChecker`
   ```
   2022/06/02 11:32:53.528 ERROR [SegmentStatusChecker] [pool-17-thread-4] Caught exception while updating segment status for table meetupRsvp_REALTIME
   java.lang.NumberFormatException: null
   	at java.lang.Integer.parseInt(Integer.java:614) ~[?:?]
   	at java.lang.Integer.parseInt(Integer.java:770) ~[?:?]
   	at org.apache.pinot.spi.config.table.SegmentsValidationAndRetentionConfig.getReplicasPerPartitionNumber(SegmentsValidationAndRetentionConfig.java:176) ~[classes/:?]
   	at org.apache.pinot.controller.helix.SegmentStatusChecker.updateTableConfigMetrics(SegmentStatusChecker.java:143) ~[classes/:?]
   	at org.apache.pinot.controller.helix.SegmentStatusChecker.processTable(SegmentStatusChecker.java:113) ~[classes/:?]
   	at org.apache.pinot.controller.helix.SegmentStatusChecker.processTable(SegmentStatusChecker.java:56) ~[classes/:?]
   	at org.apache.pinot.controller.helix.core.periodictask.ControllerPeriodicTask.processTables(ControllerPeriodicTask.java:116) ~[classes/:?]
   	at org.apache.pinot.controller.helix.core.periodictask.ControllerPeriodicTask.runTask(ControllerPeriodicTask.java:85) ~[classes/:?]
   	at org.apache.pinot.core.periodictask.BasePeriodicTask.run(BasePeriodicTask.java:150) ~[classes/:?]
   	at org.apache.pinot.core.periodictask.BasePeriodicTask.run(BasePeriodicTask.java:135) ~[classes/:?]
   	at org.apache.pinot.core.periodictask.PeriodicTaskScheduler.lambda$start$0(PeriodicTaskScheduler.java:87) ~[classes/:?]
   	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
   	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305) [?:?]
   	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305) [?:?]
   	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
   	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
   	at java.lang.Thread.run(Thread.java:829) [?:?]
   
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] navina commented on issue #8819: Updating schema with new MV column causes index failures and stops consumption

Posted by GitBox <gi...@apache.org>.
navina commented on issue #8819:
URL: https://github.com/apache/pinot/issues/8819#issuecomment-1218392411

   This issue can be reproduced by adding the following test query in `BaseClusterIntegrationTestSet#testReload` :
   `SELECT SUMMV(NewIntMVDimension) FROM " + rawTableName` 
   
   The problem wasn't caught in the existing integration tests because the simple `select *` type of queries is by default limited to 10 records. If we increase limit in the query, it is very expensive and likely to run out of heap space. So, using an aggregated function on a column ensures that the query fetches from the consuming segment as well. 
   
   The test has been updated in the PR #9205 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang closed issue #8819: Updating schema with new MV column causes index failures and stops consumption

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang closed issue #8819: Updating schema with new MV column causes index failures and stops consumption
URL: https://github.com/apache/pinot/issues/8819


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] navina commented on issue #8819: Updating schema with new MV column causes index failures and stops consumption

Posted by GitBox <gi...@apache.org>.
navina commented on issue #8819:
URL: https://github.com/apache/pinot/issues/8819#issuecomment-1145058747

   Looks like I can't assign it to myself. But I will pick this up!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org