You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "wbo4958 (via GitHub)" <gi...@apache.org> on 2024/01/24 03:38:08 UTC

Re: [PR] [SPARK-46812][SQL][CONNECT][PYTHON] Make mapInPandas / mapInArrow support ResourceProfile [spark]

wbo4958 commented on PR #44852:
URL: https://github.com/apache/spark/pull/44852#issuecomment-1907300220

   I was trying to add unit tests to check if the ResourceProfile is correctly applied to the underlying RDD generated in MapInPandasExec, here is my  testing code,
   
   ``` python
           df1 = df.mapInPandas(lambda iter: iter, "id long")
           assert df1.rdd.getResourceProfile() is None
   
           treqs = TaskResourceRequests().cpus(2)
           expected_rp = ResourceProfileBuilder().require(treqs).build
   
           df2 = df.mapInPandas(lambda iter: iter, "id long", False, expected_rp)
           assert df2.rdd.getResourceProfile() is not None
   ```
   
   But the ResourceProfile got from `df2.rdd.getResourceProfile()` is None, the reason for it is `df2.rdd` will add some other extra MapPartitionRDDs that don't have ResourceProfile attached. 
   
   I also tried to use JVM RDD to get the correct parent RDD with the below code,
   
   ``` python
   df2.rdd._jrdd.firstParent()
   ```
   
   or
   
   ``` python
   df2.rdd._jrdd.parent(0)
   ```
   
   But both of them didn't work, with below error messages,
   
   ``` console
   py4j.protocol.Py4JError: An error occurred while calling o45.parent. Trace:
   py4j.Py4JException: Method parent([class java.lang.Integer]) does not exist
   
   y4j.protocol.Py4JError: An error occurred while calling o45.firstParent. Trace:
   py4j.Py4JException: Method firstParent([]) does not exist
   ```
   
   I don't know how to add unit tests for this PR, but I will perform the manual tests.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org