You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "grundprinzip (via GitHub)" <gi...@apache.org> on 2023/07/24 20:45:37 UTC

[GitHub] [spark] grundprinzip opened a new pull request, #42132: [SPARK-44528] Support proper usage of hasattr() for Connect dataframe

grundprinzip opened a new pull request, #42132:
URL: https://github.com/apache/spark/pull/42132

   ### What changes were proposed in this pull request?
   Currently Connect does not allow the proper usage of Python's `hasattr()` to identify if an attribute is defined or not. This patch fixes that bug (it's working in regular PySpark).
   
   ### Why are the changes needed?
   Bugfix
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   ### How was this patch tested?
   UT


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on a diff in pull request #42132: [SPARK-44528][CONNECT] Support proper usage of hasattr() for Connect dataframe

Posted by "beliefer (via GitHub)" <gi...@apache.org>.
beliefer commented on code in PR #42132:
URL: https://github.com/apache/spark/pull/42132#discussion_r1273017921


##########
python/pyspark/sql/connect/dataframe.py:
##########
@@ -1584,8 +1584,16 @@ def __getattr__(self, name: str) -> "Column":
                 error_class="NOT_IMPLEMENTED",
                 message_parameters={"feature": f"{name}()"},
             )
+
+        if name not in self.columns:

Review Comment:
   How about use `elif:` here ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a diff in pull request #42132: [SPARK-44528][CONNECT] Support proper usage of hasattr() for Connect dataframe

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on code in PR #42132:
URL: https://github.com/apache/spark/pull/42132#discussion_r1273054072


##########
python/pyspark/sql/connect/dataframe.py:
##########
@@ -1584,8 +1584,16 @@ def __getattr__(self, name: str) -> "Column":
                 error_class="NOT_IMPLEMENTED",
                 message_parameters={"feature": f"{name}()"},
             )
+
+        if name not in self.columns:

Review Comment:
   yeah but this is matched with the non Spark Connect side. I think it's just fine to leave it matched.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #42132: [SPARK-44528][CONNECT] Support proper usage of hasattr() for Connect dataframe

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on PR #42132:
URL: https://github.com/apache/spark/pull/42132#issuecomment-1652697033

   Merged to master and branch-3.5.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] grundprinzip commented on pull request #42132: [SPARK-44528][CONNECT] Support proper usage of hasattr() for Connect dataframe

Posted by "grundprinzip (via GitHub)" <gi...@apache.org>.
grundprinzip commented on PR #42132:
URL: https://github.com/apache/spark/pull/42132#issuecomment-1651190737

   @HyukjinKwon I fixed the remaining failing test. PTAL again :) Thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a diff in pull request #42132: [SPARK-44528][CONNECT] Support proper usage of hasattr() for Connect dataframe

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on code in PR #42132:
URL: https://github.com/apache/spark/pull/42132#discussion_r1272947839


##########
python/pyspark/sql/tests/connect/test_connect_basic.py:
##########
@@ -157,6 +157,21 @@ def spark_connect_clean_up_test_data(cls):
 
 
 class SparkConnectBasicTests(SparkConnectSQLTestCase):
+
+    def test_df_getattr_behavior(self):
+        cdf = self.connect.range(10)
+        sdf = self.spark.range(10)
+
+        sdf._simple_extension = 10
+        cdf._simple_extension = 10
+
+        self.assertEqual(sdf._simple_extension, cdf._simple_extension)
+        self.assertEqual(type(sdf._simple_extension), type(cdf._simple_extension))
+
+        self.assertTrue(hasattr(cdf, "_simple_extension"))
+        self.assertFalse(hasattr(cdf, "_simple_extension_does_not_exsit"))

Review Comment:
   argh, it needs reformat script run



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] grundprinzip commented on pull request #42132: [SPARK-44528] Support proper usage of hasattr() for Connect dataframe

Posted by "grundprinzip (via GitHub)" <gi...@apache.org>.
grundprinzip commented on PR #42132:
URL: https://github.com/apache/spark/pull/42132#issuecomment-1648586308

   @HyukjinKwon PTAL please.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on a diff in pull request #42132: [SPARK-44528][CONNECT] Support proper usage of hasattr() for Connect dataframe

Posted by "beliefer (via GitHub)" <gi...@apache.org>.
beliefer commented on code in PR #42132:
URL: https://github.com/apache/spark/pull/42132#discussion_r1273083539


##########
python/pyspark/sql/connect/dataframe.py:
##########
@@ -1584,8 +1584,16 @@ def __getattr__(self, name: str) -> "Column":
                 error_class="NOT_IMPLEMENTED",
                 message_parameters={"feature": f"{name}()"},
             )
+
+        if name not in self.columns:

Review Comment:
   I checked the pyspark. As you said, I'm +1 for you comment.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon closed pull request #42132: [SPARK-44528][CONNECT] Support proper usage of hasattr() for Connect dataframe

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon closed pull request #42132: [SPARK-44528][CONNECT] Support proper usage of hasattr() for Connect dataframe
URL: https://github.com/apache/spark/pull/42132


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org