You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Attila Zsolt Piros (Jira)" <ji...@apache.org> on 2021/04/09 15:52:00 UTC
[jira] [Commented] (SPARK-35009) Avoid creating multiple Monitor threads for reused python workers for the same TaskContext

    [ https://issues.apache.org/jira/browse/SPARK-35009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17318097#comment-17318097 ] 

Attila Zsolt Piros commented on SPARK-35009:
--------------------------------------------

I am working on this

> Avoid creating multiple Monitor threads for reused python workers for the same TaskContext
> ------------------------------------------------------------------------------------------
>
>                 Key: SPARK-35009
>                 URL: https://issues.apache.org/jira/browse/SPARK-35009
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 3.1.0, 3.2.0, 3.1.1, 3.1.2, 3.0.3, 3.12
>            Reporter: Attila Zsolt Piros
>            Assignee: Attila Zsolt Piros
>            Priority: Major
>
> Currently this code will stop because of the high number of created threads:
> {noformat}
>     import pyspark
>     conf=pyspark.SparkConf().setMaster("local[64]").setAppName("Test1")
>     sc=pyspark.SparkContext.getOrCreate(conf)
>     rows=70000
>     data=list(range(rows))
>     rdd=sc.parallelize(data,rows)
>     assert rdd.getNumPartitions()==rows
>     rdd0=rdd.filter(lambda x:False)
>     assert rdd0.getNumPartitions()==rows
>     rdd00=rdd0.coalesce(1)
>     data=rdd00.collect()
>     assert data==[]
> {noformat}
> The error is:
> {noformat}
> 1/04/08 12:12:26 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 21/04/08 12:12:29 WARN TaskSetManager: Stage 0 contains a task of very large size (4732 KiB). The maximum recommended task size is 1000 KiB.
> [Stage 0:>                                                          (0 + 1) / 1][423.190s][warning][os,thread] Attempt to protect stack guard pages failed (0x00007f43d23ff000-0x00007f43d2403000).
> [423.190s][warning][os,thread] Attempt to deallocate stack guard pages failed.
> OpenJDK 64-Bit Server VM warning: INFO: os::commit_memory(0x00007f43d300b000, 16384, 0) failed; error='Not enough space' (errno=12)
> [423.231s][warning][os,thread] Failed to start thread - pthread_create failed (EAGAIN) for attributes: stacksize: 1024k, guardsize: 0k, detached.
> #
> # There is insufficient memory for the Java Runtime Environment to continue.
> # Native memory allocation (mmap) failed to map 16384 bytes for committing reserved memory.
> # An error report file with more information is saved as:
> # /home/ubuntu/PycharmProjects/<projekt-dir>/tests/hs_err_pid17755.log
> [thread 17966 also had an error]
> OpenJDK 64-Bit Server VM warning: INFO: os::commit_memory(0x00007f4b7bd81000, 262144, 0) failed; error='Not enough space' (errno=12)
> ERROR:root:Exception while sending command.
> Traceback (most recent call last):
>   File "/opt/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1207, in send_command
>     raise Py4JNetworkError("Answer from Java side is empty")
> py4j.protocol.Py4JNetworkError: Answer from Java side is empty
> During handling of the above exception, another exception occurred:
> Traceback (most recent call last):
>   File "/opt/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1033, in send_command
>     response = connection.send_command(command)
>   File "/opt/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1211, in send_command
>     raise Py4JNetworkError(
> py4j.protocol.Py4JNetworkError: Error while receiving
> ERROR:py4j.java_gateway:An error occurred while trying to connect to the Java server (127.0.0.1:42439)
> Traceback (most recent call last):
>   File "/opt/spark/python/pyspark/rdd.py", line 889, in collect
>     sock_info = self.ctx._jvm.PythonRDD.collectAndServe(self._jrdd.rdd())
>   File "/opt/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1304, in __call__
>     return_value = get_return_value(
>   File "/opt/spark/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", line 334, in get_return_value
>     raise Py4JError(
> py4j.protocol.Py4JError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe
> During handling of the above exception, another exception occurred:
> Traceback (most recent call last):
>   File "/opt/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 977, in _get_connection
>     connection = self.deque.pop()
> IndexError: pop from an empty deque
> During handling of the above exception, another exception occurred:
> Traceback (most recent call last):
>   File "/opt/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1115, in start
>     self.socket.connect((self.address, self.port))
> ConnectionRefusedError: [Errno 111] Connection refused
> Traceback (most recent call last):
>   File "/opt/spark/python/pyspark/rdd.py", line 889, in collect
>     sock_info = self.ctx._jvm.PythonRDD.collectAndServe(self._jrdd.rdd())
>   File "/opt/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1304, in __call__
>     return_value = get_return_value(
>   File "/opt/spark/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", line 334, in get_return_value
>     raise Py4JError(
> py4j.protocol.Py4JError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe
> During handling of the above exception, another exception occurred:
> Traceback (most recent call last):
>   File "/opt/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 977, in _get_connection
>     connection = self.deque.pop()
> IndexError: pop from an empty deque
> During handling of the above exception, another exception occurred:
> Traceback (most recent call last):
>   File "/opt/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1115, in start
>     self.socket.connect((self.address, self.port))
> ConnectionRefusedError: [Errno 111] Connection refused
> During handling of the above exception, another exception occurred:
> Traceback (most recent call last):
>   File "<input>", line 3, in <module>
>   File "/opt/pycharm-2020.2.3/plugins/python/helpers/pydev/_pydev_bundle/pydev_umd.py", line 197, in runfile
>     pydev_imports.execfile(filename, global_vars, local_vars)  # execute the script
>   File "/opt/pycharm-2020.2.3/plugins/python/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
>     exec(compile(contents+"\n", file, 'exec'), glob, loc)
>   File "/home/ubuntu/PycharmProjects/SPO_as_a_Service/tests/test_modeling_paf.py", line 992, in <module>
>     test_70000()
>   File "/home/ubuntu/PycharmProjects/SPO_as_a_Service/tests/test_modeling_paf.py", line 974, in test_70000
>     data=rdd00.collect()
>   File "/opt/spark/python/pyspark/rdd.py", line 889, in collect
>     sock_info = self.ctx._jvm.PythonRDD.collectAndServe(self._jrdd.rdd())
>   File "/opt/spark/python/pyspark/traceback_utils.py", line 78, in __exit__
>     self._context._jsc.setCallSite(None)
>   File "/opt/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1303, in __call__
>     answer = self.gateway_client.send_command(command)
>   File "/opt/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1031, in send_command
>     connection = self._get_connection()
>   File "/opt/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 979, in _get_connection
>     connection = self._create_connection()
>   File "/opt/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 985, in _create_connection
>     connection.start()
>   File "/opt/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1127, in start
>     raise Py4JNetworkError(msg, e)
> py4j.protocol.Py4JNetworkError: An error occurred while trying to connect to the Java server (127.0.0.1:42439)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org