You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2022/07/29 02:02:42 UTC

[GitHub] [doris] wangbo opened a new issue, #11319: [Bug] Fe dead Lock with unknown reseaon

wangbo opened a new issue, #11319:
URL: https://github.com/apache/doris/issues/11319

   ### Search before asking
   
   - [X] I had searched in the [issues](https://github.com/apache/incubator-doris/issues?q=is%3Aissue) and found no similar issues.
   
   
   ### Version
   
   1.1
   
   ### What's Wrong?
   
   Master hang because of Dead lock
   
   ### What You Expected?
   
   Master Fe not hang
   
   ### How to Reproduce?
   
   _No response_
   
   ### Anything Else?
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


Re: [I] [Bug](TabletInvertIndex) Fe dead Lock with unknown reseaon [doris]

Posted by "Tanya-W (via GitHub)" <gi...@apache.org>.
Tanya-W commented on issue #11319:
URL: https://github.com/apache/doris/issues/11319#issuecomment-1794017439

   and there is a thread waiting for the `stampedLock.writeLock` in TabletInvertedIndex
   
   ```
   "DynamicPartitionscheduler" #40 daemon pri=5 os_prio=0 tid=0x0000fff31c0b3800 nid=0x1763d waiting on condition [0x000fff05b6fd001]
       java.lang.Thread.State: WAITING (parking)
           at sun.misc.unsafe.park(Native Method)
           - parking to wait for <0x0000fff48cb0c010> (a java.util.concurrent.1ocks.stampedLock)
           at java.util.concurrent.locks.stampedLock.acquireWrite(stampedLock.java:1119)
           at java.util.concurrent.locks.stampedLock.writeLock(stampedLock.java:354)
           at org.apache.doris.catalog.TabletInvertedIndex.writeLock(TabletInvertedIndex.java:113)
           at org.apache.doris.catalog.TabletInvertedIndex.addTablet(TabletInvertedIndex.java:512)     
           at org.apache.doris.catalog.MaterializedIndex.addTablet(MaterializedIndex.java:128)
           at org.apache.doris.catalog.MaterializedIndex.addTablet(MaterializedIndex.java:121)
           at org.apache.doris.datasource.InternalCatalog.createTablets(Internalcatalog.java:2715)
           at org.apache.doris.datasource.Internalcatalog.createPartitionWithIndices(Internalcatalog.java:1828)
           at org.apache.doris.datasource.Internalcatalog.addPartition(Internalcatalog.java:1547)
           at org.apache.doris.catalog.Env.addPartition(Env.ava:2786)
           at org.apache.doris.clone.DynamicPartitionscheduler.executeDynamicPartition(DynamicPartitionscheduler.java:570)
           at org.apache.doris.clone.DynamicPartitionscheduler.runAfterCatalogReady(Dynamicpartitionscheduler.java:631)
           at org.apache.doris.common.util.MasterDaemon.runOnecycle(MasterDaemon.java:58)
           at org.apache.doris.common.util.Daemon.run(Daemon.java:116)
       
       Locked ownable synchronizers:
           - None
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] wangbo commented on issue #11319: [Bug](TabletInvertIndex) Fe dead Lock with unknown reseaon

Posted by GitBox <gi...@apache.org>.
wangbo commented on issue #11319:
URL: https://github.com/apache/doris/issues/11319#issuecomment-1200346027

   The JDK8's parallelStream has a global thread pool, we can call this ```thread resource```, all caller could share this thread pool.
   ```TabletInvertedIndex``` has a lock, we call this ```lock resource```.
   If we add a lock out of thread pool, then dead lock could happen between ```thread resource``` and ```lock resource``` 
   For example,
   ```
   step 1 Thread 1  get TabletInvertedIndex's read lock.
   step 2 Other threads calls parallelStream and the thread pool is full.
   step 3 Thread 1 try to submit task to thread pool,  but the it is blocked because the thread pool is full.
   step 4 Threads in thread pool wants to get TabletInvertedIndex's read lock or write lock, but the lock is held by Thread 1.
   Finally, deal lock between  thread resource  and  lock resource happens.
   ```
   
   And there is still a key question is that why readlock could block read lock.
   We can refer this https://zhuanlan.zhihu.com/p/34672421.
   In short,  even in unfair policy, write locks are not always preempted by read locks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] caiconghui commented on issue #11319: [Bug] Fe dead Lock with unknown reseaon

Posted by GitBox <gi...@apache.org>.
caiconghui commented on issue #11319:
URL: https://github.com/apache/doris/issues/11319#issuecomment-1200186902

   acutally the lock hold by this thread  
   877 "Thread-38" #88 daemon prio=5 os_prio=0 cpu=3277969.74ms elapsed=80762.88s tid=0x00007fafd6def3c0 nid=0xa04a in Object.wait()  [0x00007faf1eaed000]
     878    java.lang.Thread.State: WAITING (on object monitor)
     879         at java.lang.Object.wait(java.base@11.0.2/Native Method)
     880         - waiting on <no object reference available>
     881         at java.util.concurrent.ForkJoinTask.externalAwaitDone(java.base@11.0.2/ForkJoinTask.java:330)
     882         - waiting to re-lock in wait() <0x0000100ff8400838> (a java.util.stream.ForEachOps$ForEachTask)
     883         at java.util.concurrent.ForkJoinTask.doInvoke(java.base@11.0.2/ForkJoinTask.java:412)
     884         at java.util.concurrent.ForkJoinTask.invoke(java.base@11.0.2/ForkJoinTask.java:736)
     885         at java.util.stream.ForEachOps$ForEachOp.evaluateParallel(java.base@11.0.2/ForEachOps.java:159)
     886         at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateParallel(java.base@11.0.2/ForEachOps.java:173)
     887         at java.util.stream.AbstractPipeline.evaluate(java.base@11.0.2/AbstractPipeline.java:233)
     888         at java.util.stream.ReferencePipeline.forEach(java.base@11.0.2/ReferencePipeline.java:497)
     889         at java.util.stream.ReferencePipeline$Head.forEach(java.base@11.0.2/ReferencePipeline.java:661)
     890         at org.apache.doris.catalog.TabletInvertedIndex.tabletReport(TabletInvertedIndex.java:134)
     891         at org.apache.doris.master.ReportHandler.tabletReport(ReportHandler.java:277)
     892         at org.apache.doris.master.ReportHandler.access$400(ReportHandler.java:96)
     893         at org.apache.doris.master.ReportHandler$ReportTask.exec(ReportHandler.java:241)
     894         at org.apache.doris.master.ReportHandler.runOneCycle(ReportHandler.java:1021)
     895         at org.apache.doris.common.util.Daemon.run(Daemon.java:116)
     896 
     897    Locked ownable synchronizers:
     898         - None
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] spaces-X commented on issue #11319: [Bug](TabletInvertIndex) Fe dead Lock with unknown reseaon

Posted by GitBox <gi...@apache.org>.
spaces-X commented on issue #11319:
URL: https://github.com/apache/doris/issues/11319#issuecomment-1200708441

   > The JDK8's parallelStream has a global thread pool, we can call this `thread resource`, all caller could share this thread pool. `TabletInvertedIndex` has a lock, we call this `lock resource`. If we add a lock out of thread pool, then dead lock could happen between `thread resource` and `lock resource` For example,
   > 
   > ```
   > step 1 Thread 1  get TabletInvertedIndex's read lock.
   > step 2 Other threads calls parallelStream and the thread pool is full.
   > step 3 Thread 1 try to submit task to thread pool,  but the it is blocked because the thread pool is full.
   > step 4 Threads in thread pool wants to get TabletInvertedIndex's read lock or write lock, but the lock is held by Thread 1.
   > Finally, deal lock between  thread resource  and  lock resource happens.
   > ```
   > 
   > And there is still a key question is that why readlock could block read lock. We can refer this https://zhuanlan.zhihu.com/p/34672421. In short, even in unfair policy, write locks are not always preempted by read locks
   
   
   
   From step 1 to step 3, I think we should follow the principle of acquiring locks when exactly needed in concurrent programming, which will be helpful to  avoid many hard-to-recognize deadlock scenarios.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] caiconghui closed issue #11319: [Bug](TabletInvertIndex) Fe dead Lock with unknown reseaon

Posted by GitBox <gi...@apache.org>.
caiconghui closed issue #11319: [Bug](TabletInvertIndex) Fe dead Lock with unknown reseaon
URL: https://github.com/apache/doris/issues/11319


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


Re: [I] [Bug](TabletInvertIndex) Fe dead Lock with unknown reseaon [doris]

Posted by "Tanya-W (via GitHub)" <gi...@apache.org>.
Tanya-W commented on issue #11319:
URL: https://github.com/apache/doris/issues/11319#issuecomment-1793997817

   We still encountered this issue on **version 2.0.2-rc03.**
   We encountered a prolonged "get tableList write lock timeout" error during continuous load, and accompanied by a warnning of "the report queue size exceeds the limit".
   
   jsatck info:
   - load txn thread waiting for the `StampedLock.readLock` in TabTetInvertedindex
   ```
   "Thrift-server-pool-22" #452 daemon prio=5 os_prio=0 tid=0x0000fff30002c800 nid=0x1770e waiting on condition [0x0000ffee805fd000]
       java.lang.Thread.State: WAITING (parking)
           at sun.misc.Unsafe.park(Native Method)
           - parking to wait for <0x0000fff48cb0c010> (a java.util.concurrent.locks.StampedLock)
           at java.util.concurrent.locks.StampedLock.acquireread(StampedLock.java:1286) 
           at java.util.concurrent.locks.StampedLock.readLock(StampedLock.java:428) 
           at org.apache.doris.catalog.TabletinvertedIndex.readLock(TabTetInvertedindex.java:105) 
           at org.apache.doris.catalog.TabletInvertedIndex.getTabletMetaList(TabletInvertedIndex.java:365) 
           at org.apache.doris.transaction.DatabaseTransactiongr.checkcommitStatus(DatabaseTransactiongr.java:457) 
           at org.apache.doris.transaction.DatabaseTransactiongr.preCommitTransaction2PC(DatabaseTransactiongr.java:423) 
           at org.apache.doris.transaction.Globaltransactiongr.precommitTransaction2PC(GlobalTransactionMgr.java:219)
           at org.apache.doris.transaction.GlobalTransactiongr.preCommitTransaction2PC(GlobalTransactiongr.java:204) 
           at org.apache.doris.service.Frontendserviceimpl.loadTxnPrecommitImpl(FrontendserviceImpl.java:1376) 
           at org.apache.doris.service.FrontendServiceImpl.loadTxnPreCommit(FrontendserviceImp1.java:1294) 
           at sun.reflect.GeneratedMethodAccessor37.invoke(Unknown Source) 
           at sun.reflect.DelegatingMethodAccessorImp1.invoke(DelegatingMethodAccessorImpl.java:43) 
           at java.lang.reflect.Method.invoke(Method.java:498)
           at org.apache.doris.service.Feserver.lambda$start$0(FeServer.java:59) 
           at org.apache.doris.service.Feserver$$Lambda$126/969432090.invoke(Unknown Source) 
           at com.sun.proxy.$Proxy35.loadTxnPrecommit(Unknown Source)
           at org.apache.doris.thrift.Frontendservice$processor$1oadTxnprecommit.getResult(Frontendservice.java:3092)
           at org.apache.doris.thrift.Frontendservice$Processor$loadTxnPreCommit.getResult(Frontendservice.java:3072) 
           at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38) 
           at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:38) 
           at org.apache.thrift.server.TThreadPoolserver$workerProcess.run(TThreadPoolserver.java:250) 
           at java.til.concurrent.ThreadPoolexecutor.runworker(ThreadPoolExecutor.java:1149) 
           at java.util.concurrent.ThreadPoolexecutor$worker.run(ThreadPoolexecutor.java: 624) 
           at java.lang.Thread.run(Thread.java:748)
   
       Locked ownable synchronizers:
           - <0x0000fff483d426c0> (a java.util.concurrent.locks.ReentrantReadwriteLock$Fairsync)
           - <0x0000fff4c343f4b8> (a java.util.concurrent.ThreadPoolExecutor$worker)
   ```
   - the readLock  hold by tabletReport, and there are two thread info
   
   ```
   "ForkJoinPoo1-1-worker-93" #7584 daemon prio=5 os_prio=0 tid=0x000ffee300c000 nid=0x4e51 in obiect.wait() [0x0000ffee827fd000]
       java.lang.Thread.State: WAITING (on object monitor)
           at java.lang.object.wait(Native method)
           - waiting on <0x0000fff55aad70f8> (a java.util.stream.ForEachOps$ForEachTask)
           at java.util.concurrent.ForkjoinTask.interna7wait(forkJoinTask.java:311)
           - locked <0x0000fff55aad70f8> (a java.util.stream.ForEachops$ForeachTask)
           at java.util.concurrent.ForkJoinpoo.awaitjoin(ForkJoinPool.java:2058)
           at java.util.concurrent.ForkJoinTask.doInvoke(ForkJoinTask.java:404)
           at java.uti1.concurrent.ForkJoinTask.invoke(ForkjoinTask.java:734)
           at java.util.stream.ForEachOps$ForEachOp.evaluateParallel(ForEachOps.java:160)
           at java.util.stream.ForEachOps$ForEachOp.$OfRef.evaluateParallel(Foreachops.java:174)
           at java.util.stream.Abstractpipeline.evaluate(Abstractpipeline.java:233)
           at java.util.stream.Referencepipeline.forEach(Referencepipeline.java:418)
           at java.util.stream.Referencepipeline$Head.forEach(Referencepipeline.java:583)
           at org.apache.doris.catalog.TabletInvertedindex.lambda$tabletReport$1(TabletInvertedindex.java:141)
           at org.apache.doris.catalog.Tabletinvertedindex$$Lambda$952/131508762.run(Unknown Source)
           at java.util.concurrent.ForkJoinTask$AdaptedRunnableAction.exec(ForkJointask.java:1386)
           at java.util.concurrent.ForkJoinTask.doExec(ForkJointask.java:289)
           at java.util.concurrent.ForkJoinPool$workQueue.runTask(ForkJoinPool.java:1056)
           at java.util.concurrent.ForkJoinPool.runworker(ForkJoinPool.java:1692)
           at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)
       
       Locked ownable synchronizers:
           - None
   ```
   
   ```
   "Thread-54" #90 daemon prio=5 os_prio=0 tid=0x0000fffd3ad32800 nid=0x13061 in obiect.wait() [0x0000fff37d0fd000]
       java.lang.Thread.State: WAITING (on object monitor)
           at java.lang.object.wait(Native method)
           - waiting on <0x0000fff554117610> (a java.util.concurrent.ForkJoinTask$AdaptedRunnableAction)
           at java.util.concurrent.ForkJoinTask.externalAwaitDone(ForkJoinTask.java:334)
           - locked <0x0000fff554117610> (a java.util.concurrent.ForkJoinTask$AdaptedRunnableAction)
           at java.util.concurrent.ForkjoinTask.doJoin(ForkjoinTask.java:391)
           at java.util.concurrent.ForkJoinTask.join(ForkJoinTask.java:719)
           at org.apache.doris.catalog.TabletInvertedindex.tabletReport(TabletInvertedindex.java:328)
           at org.apache.doris.master.ReportHandler.tabletReport(ReportHandler.java:436)
           at org.apache.doris.master.ReportHandler.access$600(ReportHandler.java:109)
           at org.apache.doris.master.ReportHandler$ReportTask.exec(ReportHandler.java:272)
           at org.apache.doris.master.ReportHandler.runOneCycle(ReportHandler.java:1311)
           at org.apache.doris.common.util.Daemon.run(Daemon.java:116)
   
       Locked ownable synchronizers:
           - None
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org