You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2020/03/17 04:21:00 UTC

[jira] [Commented] (KAFKA-9727) Flaky system test StreamsEOSTest.test_failure_and_recovery

    [ https://issues.apache.org/jira/browse/KAFKA-9727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17060622#comment-17060622 ] 

ASF GitHub Bot commented on KAFKA-9727:
---------------------------------------

abbccdda commented on pull request #8307: KAFKA-9727: cleanup the state store for standby task dirty close and check null for changelogs
URL: https://github.com/apache/kafka/pull/8307
 
 
   This PR fixes two things:
   
   1. the EOS standby task should also wipe out state under dirty close
   2. the changelog reader should check for null as well
   
   The sequence to reproduce the system test failure:
   
   1. Stream job close uncleanly, leaving active task 0_0 no committed offset
   2. The task 0_0 switch from active to standby task, which never logs anything in checkpoint under EOS
   3. Task 0_0 gets illegal state for not finding checkpoints, throwing task corrupted exception
   4. Exception were caught and the task was closed, however the state store was already registered
   5. Next iteration we shall hit lock not available as it never gets released.
   6. We shall also hit a NPE in the changelog removal as well since it never gets registered.
   
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> Flaky system test StreamsEOSTest.test_failure_and_recovery
> ----------------------------------------------------------
>
>                 Key: KAFKA-9727
>                 URL: https://issues.apache.org/jira/browse/KAFKA-9727
>             Project: Kafka
>          Issue Type: Bug
>          Components: streams, system tests
>            Reporter: Boyang Chen
>            Assignee: Boyang Chen
>            Priority: Major
>
> Hits no lock available exceptions sometime after task revive:
>  
> [2020-03-13 05:40:50,224] ERROR stream-thread [EosTest-46de8ee5-82a1-4bd4-af23-f4acd8515f0f-StreamThread-1] Encountered the following exception during processing and the thread is going to shut down:  (org.apache.kafka.streams.processor.internals.StreamThread)
> org.apache.kafka.streams.errors.ProcessorStateException: Error opening store KSTREAM-AGGREGATE-STATE-STORE-0000000003 at location /mnt/streams/EosTest/0_0/rocksdb/KSTREAM-AGGREGATE-STATE-STORE-0000000003
>         at org.apache.kafka.streams.state.internals.RocksDBTimestampedStore.openRocksDB(RocksDBTimestampedStore.java:87)
>         at org.apache.kafka.streams.state.internals.RocksDBStore.openDB(RocksDBStore.java:191)
>         at org.apache.kafka.streams.state.internals.RocksDBStore.init(RocksDBStore.java:230)
>         at org.apache.kafka.streams.state.internals.WrappedStateStore.init(WrappedStateStore.java:48)
>         at org.apache.kafka.streams.state.internals.ChangeLoggingKeyValueBytesStore.init(ChangeLoggingKeyValueBytesStore.java:44)
>         at org.apache.kafka.streams.state.internals.WrappedStateStore.init(WrappedStateStore.java:48)
>         at org.apache.kafka.streams.state.internals.MeteredKeyValueStore.lambda$init$0(MeteredKeyValueStore.java:101)
>         at org.apache.kafka.streams.processor.internals.metrics.StreamsMetricsImpl.maybeMeasureLatency(StreamsMetricsImpl.java:806)
>         at org.apache.kafka.streams.state.internals.MeteredKeyValueStore.init(MeteredKeyValueStore.java:101)
>         at org.apache.kafka.streams.processor.internals.StateManagerUtil.registerStateStores(StateManagerUtil.java:81)
>         at org.apache.kafka.streams.processor.internals.StandbyTask.initializeIfNeeded(StandbyTask.java:86)
>         at org.apache.kafka.streams.processor.internals.TaskManager.tryToCompleteRestoration(TaskManager.java:275) 
>         at org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:583)
>         at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:498)
>         at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:472)
> Caused by: org.rocksdb.RocksDBException: lock : /mnt/streams/EosTest/0_0/rocksdb/KSTREAM-AGGREGATE-STATE-STORE-0000000003/LOCK: No locks available
>         at org.rocksdb.RocksDB.open(Native Method)
>         at org.rocksdb.RocksDB.open(RocksDB.java:286)
>         at org.apache.kafka.streams.state.internals.RocksDBTimestampedStore.openRocksDB(RocksDBTimestampedStore.java:75)
>         ... 14 more



--
This message was sent by Atlassian Jira
(v8.3.4#803005)