You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2022/07/29 02:02:42 UTC
[GitHub] [doris] wangbo opened a new issue, #11319: [Bug] Fe dead Lock with unknown reseaon
wangbo opened a new issue, #11319:
URL: https://github.com/apache/doris/issues/11319
### Search before asking
- [X] I had searched in the [issues](https://github.com/apache/incubator-doris/issues?q=is%3Aissue) and found no similar issues.
### Version
1.1
### What's Wrong?
Master hang because of Dead lock
### What You Expected?
Master Fe not hang
### How to Reproduce?
_No response_
### Anything Else?
_No response_
### Are you willing to submit PR?
- [ ] Yes I am willing to submit a PR!
### Code of Conduct
- [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org
Re: [I] [Bug](TabletInvertIndex) Fe dead Lock with unknown reseaon [doris]
Posted by "Tanya-W (via GitHub)" <gi...@apache.org>.
Tanya-W commented on issue #11319:
URL: https://github.com/apache/doris/issues/11319#issuecomment-1794017439
and there is a thread waiting for the `stampedLock.writeLock` in TabletInvertedIndex
```
"DynamicPartitionscheduler" #40 daemon pri=5 os_prio=0 tid=0x0000fff31c0b3800 nid=0x1763d waiting on condition [0x000fff05b6fd001]
java.lang.Thread.State: WAITING (parking)
at sun.misc.unsafe.park(Native Method)
- parking to wait for <0x0000fff48cb0c010> (a java.util.concurrent.1ocks.stampedLock)
at java.util.concurrent.locks.stampedLock.acquireWrite(stampedLock.java:1119)
at java.util.concurrent.locks.stampedLock.writeLock(stampedLock.java:354)
at org.apache.doris.catalog.TabletInvertedIndex.writeLock(TabletInvertedIndex.java:113)
at org.apache.doris.catalog.TabletInvertedIndex.addTablet(TabletInvertedIndex.java:512)
at org.apache.doris.catalog.MaterializedIndex.addTablet(MaterializedIndex.java:128)
at org.apache.doris.catalog.MaterializedIndex.addTablet(MaterializedIndex.java:121)
at org.apache.doris.datasource.InternalCatalog.createTablets(Internalcatalog.java:2715)
at org.apache.doris.datasource.Internalcatalog.createPartitionWithIndices(Internalcatalog.java:1828)
at org.apache.doris.datasource.Internalcatalog.addPartition(Internalcatalog.java:1547)
at org.apache.doris.catalog.Env.addPartition(Env.ava:2786)
at org.apache.doris.clone.DynamicPartitionscheduler.executeDynamicPartition(DynamicPartitionscheduler.java:570)
at org.apache.doris.clone.DynamicPartitionscheduler.runAfterCatalogReady(Dynamicpartitionscheduler.java:631)
at org.apache.doris.common.util.MasterDaemon.runOnecycle(MasterDaemon.java:58)
at org.apache.doris.common.util.Daemon.run(Daemon.java:116)
Locked ownable synchronizers:
- None
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org
[GitHub] [doris] wangbo commented on issue #11319: [Bug](TabletInvertIndex) Fe dead Lock with unknown reseaon
Posted by GitBox <gi...@apache.org>.
wangbo commented on issue #11319:
URL: https://github.com/apache/doris/issues/11319#issuecomment-1200346027
The JDK8's parallelStream has a global thread pool, we can call this ```thread resource```, all caller could share this thread pool.
```TabletInvertedIndex``` has a lock, we call this ```lock resource```.
If we add a lock out of thread pool, then dead lock could happen between ```thread resource``` and ```lock resource```
For example,
```
step 1 Thread 1 get TabletInvertedIndex's read lock.
step 2 Other threads calls parallelStream and the thread pool is full.
step 3 Thread 1 try to submit task to thread pool, but the it is blocked because the thread pool is full.
step 4 Threads in thread pool wants to get TabletInvertedIndex's read lock or write lock, but the lock is held by Thread 1.
Finally, deal lock between thread resource and lock resource happens.
```
And there is still a key question is that why readlock could block read lock.
We can refer this https://zhuanlan.zhihu.com/p/34672421.
In short, even in unfair policy, write locks are not always preempted by read locks
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org
[GitHub] [doris] caiconghui commented on issue #11319: [Bug] Fe dead Lock with unknown reseaon
Posted by GitBox <gi...@apache.org>.
caiconghui commented on issue #11319:
URL: https://github.com/apache/doris/issues/11319#issuecomment-1200186902
acutally the lock hold by this thread
877 "Thread-38" #88 daemon prio=5 os_prio=0 cpu=3277969.74ms elapsed=80762.88s tid=0x00007fafd6def3c0 nid=0xa04a in Object.wait() [0x00007faf1eaed000]
878 java.lang.Thread.State: WAITING (on object monitor)
879 at java.lang.Object.wait(java.base@11.0.2/Native Method)
880 - waiting on <no object reference available>
881 at java.util.concurrent.ForkJoinTask.externalAwaitDone(java.base@11.0.2/ForkJoinTask.java:330)
882 - waiting to re-lock in wait() <0x0000100ff8400838> (a java.util.stream.ForEachOps$ForEachTask)
883 at java.util.concurrent.ForkJoinTask.doInvoke(java.base@11.0.2/ForkJoinTask.java:412)
884 at java.util.concurrent.ForkJoinTask.invoke(java.base@11.0.2/ForkJoinTask.java:736)
885 at java.util.stream.ForEachOps$ForEachOp.evaluateParallel(java.base@11.0.2/ForEachOps.java:159)
886 at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateParallel(java.base@11.0.2/ForEachOps.java:173)
887 at java.util.stream.AbstractPipeline.evaluate(java.base@11.0.2/AbstractPipeline.java:233)
888 at java.util.stream.ReferencePipeline.forEach(java.base@11.0.2/ReferencePipeline.java:497)
889 at java.util.stream.ReferencePipeline$Head.forEach(java.base@11.0.2/ReferencePipeline.java:661)
890 at org.apache.doris.catalog.TabletInvertedIndex.tabletReport(TabletInvertedIndex.java:134)
891 at org.apache.doris.master.ReportHandler.tabletReport(ReportHandler.java:277)
892 at org.apache.doris.master.ReportHandler.access$400(ReportHandler.java:96)
893 at org.apache.doris.master.ReportHandler$ReportTask.exec(ReportHandler.java:241)
894 at org.apache.doris.master.ReportHandler.runOneCycle(ReportHandler.java:1021)
895 at org.apache.doris.common.util.Daemon.run(Daemon.java:116)
896
897 Locked ownable synchronizers:
898 - None
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org
[GitHub] [doris] spaces-X commented on issue #11319: [Bug](TabletInvertIndex) Fe dead Lock with unknown reseaon
Posted by GitBox <gi...@apache.org>.
spaces-X commented on issue #11319:
URL: https://github.com/apache/doris/issues/11319#issuecomment-1200708441
> The JDK8's parallelStream has a global thread pool, we can call this `thread resource`, all caller could share this thread pool. `TabletInvertedIndex` has a lock, we call this `lock resource`. If we add a lock out of thread pool, then dead lock could happen between `thread resource` and `lock resource` For example,
>
> ```
> step 1 Thread 1 get TabletInvertedIndex's read lock.
> step 2 Other threads calls parallelStream and the thread pool is full.
> step 3 Thread 1 try to submit task to thread pool, but the it is blocked because the thread pool is full.
> step 4 Threads in thread pool wants to get TabletInvertedIndex's read lock or write lock, but the lock is held by Thread 1.
> Finally, deal lock between thread resource and lock resource happens.
> ```
>
> And there is still a key question is that why readlock could block read lock. We can refer this https://zhuanlan.zhihu.com/p/34672421. In short, even in unfair policy, write locks are not always preempted by read locks
From step 1 to step 3, I think we should follow the principle of acquiring locks when exactly needed in concurrent programming, which will be helpful to avoid many hard-to-recognize deadlock scenarios.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org
[GitHub] [doris] caiconghui closed issue #11319: [Bug](TabletInvertIndex) Fe dead Lock with unknown reseaon
Posted by GitBox <gi...@apache.org>.
caiconghui closed issue #11319: [Bug](TabletInvertIndex) Fe dead Lock with unknown reseaon
URL: https://github.com/apache/doris/issues/11319
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org
Re: [I] [Bug](TabletInvertIndex) Fe dead Lock with unknown reseaon [doris]
Posted by "Tanya-W (via GitHub)" <gi...@apache.org>.
Tanya-W commented on issue #11319:
URL: https://github.com/apache/doris/issues/11319#issuecomment-1793997817
We still encountered this issue on **version 2.0.2-rc03.**
We encountered a prolonged "get tableList write lock timeout" error during continuous load, and accompanied by a warnning of "the report queue size exceeds the limit".
jsatck info:
- load txn thread waiting for the `StampedLock.readLock` in TabTetInvertedindex
```
"Thrift-server-pool-22" #452 daemon prio=5 os_prio=0 tid=0x0000fff30002c800 nid=0x1770e waiting on condition [0x0000ffee805fd000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x0000fff48cb0c010> (a java.util.concurrent.locks.StampedLock)
at java.util.concurrent.locks.StampedLock.acquireread(StampedLock.java:1286)
at java.util.concurrent.locks.StampedLock.readLock(StampedLock.java:428)
at org.apache.doris.catalog.TabletinvertedIndex.readLock(TabTetInvertedindex.java:105)
at org.apache.doris.catalog.TabletInvertedIndex.getTabletMetaList(TabletInvertedIndex.java:365)
at org.apache.doris.transaction.DatabaseTransactiongr.checkcommitStatus(DatabaseTransactiongr.java:457)
at org.apache.doris.transaction.DatabaseTransactiongr.preCommitTransaction2PC(DatabaseTransactiongr.java:423)
at org.apache.doris.transaction.Globaltransactiongr.precommitTransaction2PC(GlobalTransactionMgr.java:219)
at org.apache.doris.transaction.GlobalTransactiongr.preCommitTransaction2PC(GlobalTransactiongr.java:204)
at org.apache.doris.service.Frontendserviceimpl.loadTxnPrecommitImpl(FrontendserviceImpl.java:1376)
at org.apache.doris.service.FrontendServiceImpl.loadTxnPreCommit(FrontendserviceImp1.java:1294)
at sun.reflect.GeneratedMethodAccessor37.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImp1.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.doris.service.Feserver.lambda$start$0(FeServer.java:59)
at org.apache.doris.service.Feserver$$Lambda$126/969432090.invoke(Unknown Source)
at com.sun.proxy.$Proxy35.loadTxnPrecommit(Unknown Source)
at org.apache.doris.thrift.Frontendservice$processor$1oadTxnprecommit.getResult(Frontendservice.java:3092)
at org.apache.doris.thrift.Frontendservice$Processor$loadTxnPreCommit.getResult(Frontendservice.java:3072)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:38)
at org.apache.thrift.server.TThreadPoolserver$workerProcess.run(TThreadPoolserver.java:250)
at java.til.concurrent.ThreadPoolexecutor.runworker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolexecutor$worker.run(ThreadPoolexecutor.java: 624)
at java.lang.Thread.run(Thread.java:748)
Locked ownable synchronizers:
- <0x0000fff483d426c0> (a java.util.concurrent.locks.ReentrantReadwriteLock$Fairsync)
- <0x0000fff4c343f4b8> (a java.util.concurrent.ThreadPoolExecutor$worker)
```
- the readLock hold by tabletReport, and there are two thread info
```
"ForkJoinPoo1-1-worker-93" #7584 daemon prio=5 os_prio=0 tid=0x000ffee300c000 nid=0x4e51 in obiect.wait() [0x0000ffee827fd000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.object.wait(Native method)
- waiting on <0x0000fff55aad70f8> (a java.util.stream.ForEachOps$ForEachTask)
at java.util.concurrent.ForkjoinTask.interna7wait(forkJoinTask.java:311)
- locked <0x0000fff55aad70f8> (a java.util.stream.ForEachops$ForeachTask)
at java.util.concurrent.ForkJoinpoo.awaitjoin(ForkJoinPool.java:2058)
at java.util.concurrent.ForkJoinTask.doInvoke(ForkJoinTask.java:404)
at java.uti1.concurrent.ForkJoinTask.invoke(ForkjoinTask.java:734)
at java.util.stream.ForEachOps$ForEachOp.evaluateParallel(ForEachOps.java:160)
at java.util.stream.ForEachOps$ForEachOp.$OfRef.evaluateParallel(Foreachops.java:174)
at java.util.stream.Abstractpipeline.evaluate(Abstractpipeline.java:233)
at java.util.stream.Referencepipeline.forEach(Referencepipeline.java:418)
at java.util.stream.Referencepipeline$Head.forEach(Referencepipeline.java:583)
at org.apache.doris.catalog.TabletInvertedindex.lambda$tabletReport$1(TabletInvertedindex.java:141)
at org.apache.doris.catalog.Tabletinvertedindex$$Lambda$952/131508762.run(Unknown Source)
at java.util.concurrent.ForkJoinTask$AdaptedRunnableAction.exec(ForkJointask.java:1386)
at java.util.concurrent.ForkJoinTask.doExec(ForkJointask.java:289)
at java.util.concurrent.ForkJoinPool$workQueue.runTask(ForkJoinPool.java:1056)
at java.util.concurrent.ForkJoinPool.runworker(ForkJoinPool.java:1692)
at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)
Locked ownable synchronizers:
- None
```
```
"Thread-54" #90 daemon prio=5 os_prio=0 tid=0x0000fffd3ad32800 nid=0x13061 in obiect.wait() [0x0000fff37d0fd000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.object.wait(Native method)
- waiting on <0x0000fff554117610> (a java.util.concurrent.ForkJoinTask$AdaptedRunnableAction)
at java.util.concurrent.ForkJoinTask.externalAwaitDone(ForkJoinTask.java:334)
- locked <0x0000fff554117610> (a java.util.concurrent.ForkJoinTask$AdaptedRunnableAction)
at java.util.concurrent.ForkjoinTask.doJoin(ForkjoinTask.java:391)
at java.util.concurrent.ForkJoinTask.join(ForkJoinTask.java:719)
at org.apache.doris.catalog.TabletInvertedindex.tabletReport(TabletInvertedindex.java:328)
at org.apache.doris.master.ReportHandler.tabletReport(ReportHandler.java:436)
at org.apache.doris.master.ReportHandler.access$600(ReportHandler.java:109)
at org.apache.doris.master.ReportHandler$ReportTask.exec(ReportHandler.java:272)
at org.apache.doris.master.ReportHandler.runOneCycle(ReportHandler.java:1311)
at org.apache.doris.common.util.Daemon.run(Daemon.java:116)
Locked ownable synchronizers:
- None
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org