You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@linkis.apache.org by Andy <an...@163.com> on 2022/05/09 09:16:04 UTC

[DISCUSS] The python engine has been in a busy state


————— 2022-5-8 —————

Heisenberg 11:52
@peacewong@WDS Brother Ping, I have a question for help. I have been reading for a long time and have no idea. The python engine has been in a busy state, resulting in the process not being released. This is not in a hurry.

peacewong@WDS 11:55
Okay, I'll turn on the computer later and have a look

Heisenberg 11:59
Mmm thanks 😁

peacewong@WDS 12:52
The problem should be that this caused the python interpreter to exit abnormally, and then cause the execution to get stuck and never get a return. You can search for this log in the jstack log: 0x00000000c1eb19e0
. Unlock is always busy because waiting for this lock is unsuccessful.

peacewong@WDS 12:52
1.1.1 is optimized here. When the python task is killed, it will go to close the interpreter

Heisenberg 13:03
Is the python interpreter the same concept as the python engine process?

peacewong@WDS 13:06
No, the python interpreter is the python process that executes python, and the python engine is the engine conn responsible for starting the interpreter, connecting with the entry, etc.

Heisenberg 13:11
Ok, thank you brother Ping, I will add this to the version we use first

Heisenberg 13:23
Brother Ping, which engine instance is the script executed by? Is there a record of such information?

Heisenberg 13:23
@peacewong @WDS

peacewong@WDS 13:25
Brother Longping, this feature is planned for version 1.1.3, and no one has claimed it yet.

Heisenberg 13:29
The fundamental reason why the python engine has been busy I encountered is that the user actively killed the task, and then the python interpreter was not closed, resulting in the python engine being in a busy state?
This will not be the case if an error occurs during the execution of the statement

Heisenberg 13:30
I don't know if I understand right, brother Ping

Heisenberg 13:31
I looked at the user's execution record. At one point in time, the user killed the task, and then the execution of the following statement was no longer submitted to the original engine (because the original engine was busy), but a new engine was started.

peacewong@WDS 13:31
Yes, this is not normally the case. I saw the stack and log above because the unlock was stuck. After 1.1.1, there is no such problem.

peacewong@WDS 13:31
yes

Heisenberg 13:35
Well, well, the reason is clear, this 1.0.3 can make a patch according to the code of 1.1.1, I don't know if the change is relatively big, because the current 1.0.3 upgrade 1.1.x still needs to be in our internal cycle longer time

peacewong@WDS 13:37
Yes, with minor modifications

Heisenberg 13:37
okay

Heisenberg 13:41
@peacewong@WDS reproduced this problem at once bro Ping

Heisenberg 13:41
[Breaking tears and laughing] Brother Ping 666

Heisenberg 13:42
Click to run and then kill to reproduce

Heisenberg 15:52
@peacewong@WDS Brother Ping, it seems to be okay after close pythonSession

Heisenberg 15:52
return

peacewong@WDS 16:14
See where the stack gets stuck?

Heisenberg 16:15
find the reason

Heisenberg 16:15
This sentence needs a comment

Heisenberg 16:16
There are omissions when fixing

Heisenberg 16:18
I'll try it first and see if it works

peacewong@WDS 16:24
OK

Heisenberg 16:44
locked location

Heisenberg 16:44
@peacewong @WDS

Heisenberg 16:45
?

Heisenberg 16:46
I don't know why, the task is killed, the pythonSession is also killed, the engine status is still busy

Heisenberg 16:47
0x00000000f02d8e80 I feel that this lock is the reason why the engine has been busy

peacewong@WDS 16:59
Make sure it's executed here

Heisenberg 17:12
There are still some differences between 1.0.3 and 1.1.1.

Heisenberg 17:13
PythonExecutor has stopped with exit code This sentence is executed

peacewong@WDS 17:14
need to add

peacewong@WDS 17:14
The above sentence is more critical and will interrupt the stuck thread above.

Heisenberg 17:15
Ok, I modified the PythonSession implementation of 1.0.3 according to 1.1.1

peacewong@WDS 17:16
[OK]

Heisenberg 19:02
@peacewong@WDS Ping brother, the problem is solved

peacewong@WDS 19:04
OK, you can upgrade it later

Heisenberg 19:05
Uh-huh





—————  2022-5-8  —————

海森堡 11:52
@peacewong@WDS 平哥求助个问题,我看了半天没啥思路,python引擎一直处于busy状态,导致进程不释放,这个不着急,有空帮忙看下就行

peacewong@WDS 11:55
好的,晚点我打开电脑看看

海森堡 11:59
嗯嗯 感谢😁

peacewong@WDS 12:52
问题应该是这个引起的python解释器异常退出了,然后导致执行卡住,一直没有得到返回。可以在jstack日志里面搜索这个日志:0x00000000c1eb19e0
。unlock因为等待这个锁没有成功,导致状态一直是busy。

peacewong@WDS 12:52
这里1.1.1优化了的,python任务kill的时候会去close解释器

海森堡 13:03
python解释器 跟python引擎进程是一个概念嘛?

peacewong@WDS 13:06
不是的,python解释器是执行python的python进程,python引擎就是engine conn负责启动解释器,和entrance对接等

海森堡 13:11
好的,多谢平哥,这个我在我们用的版本里先加下

海森堡 13:23
平哥 ,脚本被哪个引擎实例执行的,这样的信息有记录嘛

海森堡 13:23
@peacewong@WDS

peacewong@WDS 13:25
龙平哥,这个特性规划到1.1.3版本里面了,现在暂时还没人认领的。

海森堡 13:29
我遇到的python引擎一直busy的根本原因是,用户主动kill了任务,然后python解释器没有被close掉,导致python引擎一直处于busy状态?
如果是执行语句的过程中发生错误,则不会有这样的情况

海森堡 13:30
不知道我理解的对吗 平哥

海森堡 13:31
我看了用户的执行记录,有一个时间点,用户kill了任务,然后,后面的语句执行就不再提交到原有引擎啦(因为原有引擎busy啦),而是另起了一个新引擎

peacewong@WDS 13:31
是的,正常不会有这个情况的,我看上面的堆栈和日志是因为unlock卡住了,这个1.1.1后没这个问题的。

peacewong@WDS 13:31
是的

海森堡 13:35
嗯嗯 好嘞,原因搞明白了,这个1.0.3 里按照1.1.1的代码可以做一个patch嘛,不知道改动是否比较大,因为目前1.0.3 升级1.1.x 在我们内部的周期还是需要比较久的时间

peacewong@WDS 13:37
可以的,修改点较小

海森堡 13:37
好嘞

海森堡 13:41
@peacewong@WDS 一下子就复现这个问题啦 平哥

海森堡 13:41
[破涕为笑] 平哥 666

海森堡 13:42
点击运行 然后 kill ,就复现啦

海森堡 15:52
@peacewong@WDS 平哥,close pythonSession后好像好不行

海森堡 15:52
还

peacewong@WDS 16:14
看看堆栈卡在哪里下?

海森堡 16:15
找到原因啦

海森堡 16:15
这句需要注释

海森堡 16:16
fix时候有遗漏的

海森堡 16:18
我先试下 看看能不能解决

peacewong@WDS 16:24
好的

海森堡 16:44
locked的位置

海森堡 16:44
@peacewong@WDS

海森堡 16:45
?

海森堡 16:46
不知道为啥,任务kill啦,pythonSession也kill了,引擎状态还是busy

海森堡 16:47
0x00000000f02d8e80 感觉这个锁是导致引擎一直busy的原因

peacewong@WDS 16:59
确认下这里执行没

海森堡 17:12
1.0.3 跟1.1.1中这里的实现还有些差异

海森堡 17:13
PythonExecutor has stopped with exit code 这句执行了

peacewong@WDS 17:14
需要加上下

peacewong@WDS 17:14
上面那句比较关键,会打断上面卡住的线程

海森堡 17:15
好的 我按照1.1.1 修改下1.0.3的PythonSession实现

peacewong@WDS 17:16
[OK]

海森堡 19:02
@peacewong@WDS 平哥,问题解决了

peacewong@WDS 19:04
好的,可以后面升级下

海森堡 19:05
嗯嗯 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@linkis.apache.org
For additional commands, e-mail: dev-help@linkis.apache.org