You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/11/09 18:30:06 UTC

[GitHub] [spark] xinrong-meng opened a new pull request, #38584: [SPARK-40281] Memory Profiler on Executors

xinrong-meng opened a new pull request, #38584:
URL: https://github.com/apache/spark/pull/38584

   ### What changes were proposed in this pull request?
   Introduce memory profiling on executors.
   
   ### Why are the changes needed?
   See more [design](https://docs.google.com/document/d/e/2PACX-1vQLphItWY-WYO32ZQwtBpYbagqfep_Hk-cL_-UV8r6tiYFMp1QDJPGNmBEi-xBp_vlkcCMCW0hDBI6j/pub).
   
   ### Does this PR introduce _any_ user-facing change?
   No.
   
   ### How was this patch tested?
   Unit tests.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ueshin commented on a diff in pull request #38584: [SPARK-40281][PYTHON] Memory Profiler on Executors

Posted by GitBox <gi...@apache.org>.
ueshin commented on code in PR #38584:
URL: https://github.com/apache/spark/pull/38584#discussion_r1019507290


##########
python/pyspark/tests/test_memory_profiler.py:
##########
@@ -0,0 +1,160 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+import os
+import sys
+import tempfile
+import unittest
+import warnings
+from io import StringIO
+from typing import Iterator
+from unittest import mock
+
+import pandas as pd
+
+from pyspark import SparkConf, SparkContext
+from pyspark.sql import SparkSession
+from pyspark.sql.functions import pandas_udf, udf
+from pyspark.testing.utils import PySparkTestCase
+
+try:
+    import memory_profiler  # type: ignore[import] # noqa: F401
+
+    has_memory_profiler = True
+except Exception:
+    has_memory_profiler = False
+
+
+@unittest.skipIf(not has_memory_profiler, "Must have memory-profiler installed.")
+class MemoryProfilerTests(PySparkTestCase):

Review Comment:
   We need to update `dev/sparktestsupport/modules.py`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon closed pull request #38584: [SPARK-40281][PYTHON] Memory Profiler on Executors

Posted by GitBox <gi...@apache.org>.
HyukjinKwon closed pull request #38584: [SPARK-40281][PYTHON] Memory Profiler on Executors
URL: https://github.com/apache/spark/pull/38584


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm commented on pull request #38584: [SPARK-40281] Memory Profiler on Executors

Posted by GitBox <gi...@apache.org>.
mridulm commented on PR #38584:
URL: https://github.com/apache/spark/pull/38584#issuecomment-1309300022

   +CC @zhouyejoe 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] xinrong-meng commented on pull request #38584: [SPARK-40281][PYTHON] Memory Profiler on Executors

Posted by GitBox <gi...@apache.org>.
xinrong-meng commented on PR #38584:
URL: https://github.com/apache/spark/pull/38584#issuecomment-1309743285

   @ueshin May I ask for your review? Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] xinrong-meng commented on pull request #38584: [SPARK-40281][PYTHON] Memory Profiler on Executors

Posted by GitBox <gi...@apache.org>.
xinrong-meng commented on PR #38584:
URL: https://github.com/apache/spark/pull/38584#issuecomment-1309695390

   Thank you @HyukjinKwon ! I will file a separate PR for the comprehensive documents.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #38584: [SPARK-40281][PYTHON] Memory Profiler on Executors

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on PR #38584:
URL: https://github.com/apache/spark/pull/38584#issuecomment-1311168746

   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #38584: [SPARK-40281][PYTHON] Memory Profiler on Executors

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on PR #38584:
URL: https://github.com/apache/spark/pull/38584#issuecomment-1309608751

   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Tagar commented on pull request #38584: [SPARK-40281][PYTHON] Memory Profiler on Executors

Posted by GitBox <gi...@apache.org>.
Tagar commented on PR #38584:
URL: https://github.com/apache/spark/pull/38584#issuecomment-1310665323

   cc @LucaCanali 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] xinrong-meng commented on a diff in pull request #38584: [SPARK-40281][PYTHON] Memory Profiler on Executors

Posted by GitBox <gi...@apache.org>.
xinrong-meng commented on code in PR #38584:
URL: https://github.com/apache/spark/pull/38584#discussion_r1019525200


##########
python/pyspark/tests/test_memory_profiler.py:
##########
@@ -0,0 +1,160 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+import os
+import sys
+import tempfile
+import unittest
+import warnings
+from io import StringIO
+from typing import Iterator
+from unittest import mock
+
+import pandas as pd
+
+from pyspark import SparkConf, SparkContext
+from pyspark.sql import SparkSession
+from pyspark.sql.functions import pandas_udf, udf
+from pyspark.testing.utils import PySparkTestCase
+
+try:
+    import memory_profiler  # type: ignore[import] # noqa: F401
+
+    has_memory_profiler = True
+except Exception:
+    has_memory_profiler = False
+
+
+@unittest.skipIf(not has_memory_profiler, "Must have memory-profiler installed.")
+class MemoryProfilerTests(PySparkTestCase):

Review Comment:
   Good catch! Updated.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org