You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "panbingkun (via GitHub)" <gi...@apache.org> on 2023/07/03 01:28:04 UTC

[GitHub] [spark] panbingkun opened a new pull request, #41823: [SPARK-43476][PYTHON][TESTS] Enable SeriesStringTests.test_string_replace for pandas 2.0.0.

panbingkun opened a new pull request, #41823:
URL: https://github.com/apache/spark/pull/41823

   ### What changes were proposed in this pull request?
   The pr aims to enable SeriesStringTests.test_string_replace for pandas 2.0.0.
   
   ### Why are the changes needed?
   Improve UT coverage.
   
   
   ### Does this PR introduce _any_ user-facing change?
   No.
   
   ### How was this patch tested?
   - Pass GA.
   - Manually test:
   '''
   (base) panbingkun:~/Developer/spark/spark-community$python/run-tests --testnames 'pyspark.pandas.tests.test_series_string SeriesStringTests.test_string_replace'
   Running PySpark tests. Output is in /Users/panbingkun/Developer/spark/spark-community/python/unit-tests.log
   Will test against the following Python executables: ['python3.9']
   Will test the following Python tests: ['pyspark.pandas.tests.test_series_string SeriesStringTests.test_string_replace']
   python3.9 python_implementation is CPython
   python3.9 version is: Python 3.9.13
   Starting test(python3.9): pyspark.pandas.tests.test_series_string SeriesStringTests.test_string_replace (temp output: /Users/panbingkun/Developer/spark/spark-community/python/target/d51a913a-b400-4d1b-adb3-97765bb463bd/python3.9__pyspark.pandas.tests.test_series_string_SeriesStringTests.test_string_replace__izk1fx8o.log)
   Finished test(python3.9): pyspark.pandas.tests.test_series_string SeriesStringTests.test_string_replace (13s)
   Tests passed in 13 seconds
   '''


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] panbingkun commented on pull request #41823: [SPARK-43476][PYTHON][TESTS] Enable SeriesStringTests.test_string_replace for pandas 2.0.0.

Posted by "panbingkun (via GitHub)" <gi...@apache.org>.
panbingkun commented on PR #41823:
URL: https://github.com/apache/spark/pull/41823#issuecomment-1617089794

   cc @itholic 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon closed pull request #41823: [SPARK-43476][PYTHON][TESTS] Enable SeriesStringTests.test_string_replace for pandas 2.0.0.

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon closed pull request #41823: [SPARK-43476][PYTHON][TESTS] Enable SeriesStringTests.test_string_replace for pandas 2.0.0.
URL: https://github.com/apache/spark/pull/41823


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a diff in pull request #41823: [SPARK-43476][PYTHON][TESTS] Enable SeriesStringTests.test_string_replace for pandas 2.0.0.

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on code in PR #41823:
URL: https://github.com/apache/spark/pull/41823#discussion_r1252417967


##########
python/pyspark/pandas/tests/test_series_string.py:
##########
@@ -259,10 +255,11 @@ def test_string_replace(self):
         def repl(m):
             return m.group(0)[::-1]
 
-        self.check_func(lambda x: x.str.replace(r"[a-z]+", repl))
+        regex_pat = re.compile(r"[a-z]+")
+        self.check_func(lambda x: x.str.replace(regex_pat, repl, regex=True))
         # compiled regex with flags
         regex_pat = re.compile(r"WHITESPACE", flags=re.IGNORECASE)
-        self.check_func(lambda x: x.str.replace(regex_pat, "---"))

Review Comment:
   Okay, let's get this reverted for now.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #41823: [SPARK-43476][PYTHON][TESTS] Enable SeriesStringTests.test_string_replace for pandas 2.0.0.

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on PR #41823:
URL: https://github.com/apache/spark/pull/41823#issuecomment-1617441982

   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on pull request #41823: [SPARK-43476][PYTHON][TESTS] Enable SeriesStringTests.test_string_replace for pandas 2.0.0.

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on PR #41823:
URL: https://github.com/apache/spark/pull/41823#issuecomment-1617398781

   cc @itholic @HyukjinKwon @zhengruifeng FYI


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a diff in pull request #41823: [SPARK-43476][PYTHON][TESTS] Enable SeriesStringTests.test_string_replace for pandas 2.0.0.

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on code in PR #41823:
URL: https://github.com/apache/spark/pull/41823#discussion_r1250464460


##########
python/pyspark/pandas/tests/test_series_string.py:
##########
@@ -259,10 +255,11 @@ def test_string_replace(self):
         def repl(m):
             return m.group(0)[::-1]
 
-        self.check_func(lambda x: x.str.replace(r"[a-z]+", repl))
+        regex_pat = re.compile(r"[a-z]+")
+        self.check_func(lambda x: x.str.replace(regex_pat, repl, regex=True))
         # compiled regex with flags
         regex_pat = re.compile(r"WHITESPACE", flags=re.IGNORECASE)
-        self.check_func(lambda x: x.str.replace(regex_pat, "---"))

Review Comment:
   Actually, I think we should also make the behaviour the same with pandas 2.0.0 at pandas API on Spark too. I forgot how we concluded this. Right @itholic ? and should upgrade migration guide too?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] panbingkun commented on pull request #41823: [SPARK-43476][PYTHON][TESTS] Enable SeriesStringTests.test_string_replace for pandas 2.0.0.

Posted by "panbingkun (via GitHub)" <gi...@apache.org>.
panbingkun commented on PR #41823:
URL: https://github.com/apache/spark/pull/41823#issuecomment-1617090216

   https://pandas.pydata.org/docs/reference/api/pandas.Series.str.replace.html


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] itholic commented on a diff in pull request #41823: [SPARK-43476][PYTHON][TESTS] Enable SeriesStringTests.test_string_replace for pandas 2.0.0.

Posted by "itholic (via GitHub)" <gi...@apache.org>.
itholic commented on code in PR #41823:
URL: https://github.com/apache/spark/pull/41823#discussion_r1250536265


##########
python/pyspark/pandas/tests/test_series_string.py:
##########
@@ -259,10 +255,11 @@ def test_string_replace(self):
         def repl(m):
             return m.group(0)[::-1]
 
-        self.check_func(lambda x: x.str.replace(r"[a-z]+", repl))
+        regex_pat = re.compile(r"[a-z]+")
+        self.check_func(lambda x: x.str.replace(regex_pat, repl, regex=True))
         # compiled regex with flags
         regex_pat = re.compile(r"WHITESPACE", flags=re.IGNORECASE)
-        self.check_func(lambda x: x.str.replace(regex_pat, "---"))

Review Comment:
   Yeah, as mentioned in https://github.com/apache/spark/pull/41824/files#r1250524246, it is advisable to wait for the Spark 4.0 release and address the behavior instead of just making changes to pass the tests. Just created a new ticket SPARK-44276 in order not to forget.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org