You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/11/11 08:04:07 UTC

[GitHub] [spark] zhengruifeng opened a new pull request, #38621: [SPARK-41111][CONNECT][PYTHON] Implement `DataFrame.show`

zhengruifeng opened a new pull request, #38621:
URL: https://github.com/apache/spark/pull/38621

   ### What changes were proposed in this pull request?
   Implement `DataFrame.show`
   
   
   ### Why are the changes needed?
   api coverage
   
   
   ### Does this PR introduce _any_ user-facing change?
   yes
   
   
   ### How was this patch tested?
   added ut


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan closed pull request #38621: [SPARK-41111][CONNECT][PYTHON] Implement `DataFrame.show`

Posted by GitBox <gi...@apache.org>.
cloud-fan closed pull request #38621: [SPARK-41111][CONNECT][PYTHON] Implement `DataFrame.show`
URL: https://github.com/apache/spark/pull/38621


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on a diff in pull request #38621: [SPARK-41111][CONNECT][PYTHON] Implement `DataFrame.show`

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on code in PR #38621:
URL: https://github.com/apache/spark/pull/38621#discussion_r1020987941


##########
connector/connect/src/main/protobuf/spark/connect/relations.proto:
##########
@@ -253,6 +254,23 @@ message Repartition {
   bool shuffle = 3;
 }
 
+// Compose the string representing rows for output.
+// It will invoke 'Dataset.showString' to compute the results.
+message ShowString {
+  // (Required). The input relation.
+  Relation input = 1;
+
+  // (Required) Number of rows to show.

Review Comment:
   I prefer to make it `required` here.
   we can not invoke the `show` methods since they just print and return Unit.
   to get a string, we need to invoke `showString` which requires all the parameters.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on pull request #38621: [SPARK-41111][CONNECT][PYTHON] Implement `DataFrame.show`

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on PR #38621:
URL: https://github.com/apache/spark/pull/38621#issuecomment-1311414153

   cc @HyukjinKwon @cloud-fan @amaliujia 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38621: [SPARK-41111][CONNECT][PYTHON] Implement `DataFrame.show`

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on code in PR #38621:
URL: https://github.com/apache/spark/pull/38621#discussion_r1020207990


##########
python/pyspark/sql/connect/dataframe.py:
##########
@@ -388,8 +388,55 @@ def sample(
             session=self._session,
         )
 
-    def show(self, n: int, truncate: Optional[Union[bool, int]], vertical: Optional[bool]) -> None:
-        ...
+    def _show_string(
+        self, n: int = 20, truncate: Union[bool, int] = True, vertical: bool = False
+    ) -> str:
+        if not isinstance(n, int) or isinstance(n, bool):
+            raise TypeError("Parameter 'n' (number of rows) must be an int")
+        if not isinstance(vertical, bool):
+            raise TypeError("Parameter 'vertical' must be a bool")
+
+        _truncate: int = -1
+        if isinstance(truncate, bool) and truncate:
+            _truncate = 20
+        else:
+            try:
+                _truncate = int(truncate)
+            except ValueError:
+                raise TypeError(
+                    "Parameter 'truncate={}' should be either bool or int.".format(truncate)
+                )
+
+        pdf = DataFrame.withPlan(
+            plan.ShowString(child=self._plan, numRows=n, truncate=_truncate, vertical=vertical),
+            session=self._session,
+        ).toPandas()
+        assert pdf is not None
+        return pdf.iloc[0, 0]
+
+    def show(self, n: int = 20, truncate: Union[bool, int] = True, vertical: bool = False) -> None:
+        """
+        Prints the first ``n`` rows to the console.
+
+        .. versionadded:: 3.4.0
+
+        Parameters
+        ----------
+        n : int, optional
+            Number of rows to show.
+        truncate : bool or int, optional
+            If set to ``True``, truncate strings longer than 20 chars by default.
+            If set to a number greater than one, truncates long strings to length ``truncate``
+            and align cells right.
+        vertical : bool, optional
+            If set to ``True``, print output rows vertically (one line
+            per column value).
+
+        Returns
+        -------
+        None
+        """

Review Comment:
   I think we can just remove
   ```suggestion
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on a diff in pull request #38621: [SPARK-41111][CONNECT][PYTHON] Implement `DataFrame.show`

Posted by GitBox <gi...@apache.org>.
amaliujia commented on code in PR #38621:
URL: https://github.com/apache/spark/pull/38621#discussion_r1020483723


##########
connector/connect/src/main/protobuf/spark/connect/relations.proto:
##########
@@ -253,6 +254,23 @@ message Repartition {
   bool shuffle = 3;
 }
 
+// Compose the string representing rows for output.
+// It will invoke 'Dataset.showString' to compute the results.
+message ShowString {
+  // (Required). The input relation.
+  Relation input = 1;
+
+  // (Required) Number of rows to show.

Review Comment:
   I think we should have a proto message style guide soon that people agree.
   
   For example required field with `optional` notion is very confusing.
   
   IMO required field does not need `optional` to know set/unset. We say it is required which means clients should always offer a value. Server side just use the value. 
   
   



##########
connector/connect/src/main/protobuf/spark/connect/relations.proto:
##########
@@ -253,6 +254,23 @@ message Repartition {
   bool shuffle = 3;
 }
 
+// Compose the string representing rows for output.
+// It will invoke 'Dataset.showString' to compute the results.
+message ShowString {
+  // (Required). The input relation.
+  Relation input = 1;
+
+  // (Required) Number of rows to show.

Review Comment:
   I think we should have a proto message style guide ASAP that people agree.
   
   For example required field with `optional` notion is very confusing.
   
   IMO required field does not need `optional` to know set/unset. We say it is required which means clients should always offer a value. Server side just use the value. 
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on a diff in pull request #38621: [SPARK-41111][CONNECT][PYTHON] Implement `DataFrame.show`

Posted by GitBox <gi...@apache.org>.
amaliujia commented on code in PR #38621:
URL: https://github.com/apache/spark/pull/38621#discussion_r1020488570


##########
python/pyspark/sql/tests/connect/test_connect_basic.py:
##########
@@ -217,6 +217,12 @@ def test_empty_dataset(self):
     def test_session(self):
         self.assertEqual(self.connect, self.connect.sql("SELECT 1").sparkSession())
 
+    def test_show(self):
+        # SPARK-41111: Test the show method
+        show_str = self.connect.sql("SELECT 1 AS X, 2 AS Y")._show_string()
+        expected = "+---+---+\n|  X|  Y|\n+---+---+\n|  1|  2|\n+---+---+\n"

Review Comment:
   r'''
   "+---+---+
   |      X|    Y|
   +---+---+
   |       1|   2|
   +---+---+
   '''
   
   Does this get rid of the one line with `\n` embedded that is a bit hard to read? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on pull request #38621: [SPARK-41111][CONNECT][PYTHON] Implement `DataFrame.show`

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on PR #38621:
URL: https://github.com/apache/spark/pull/38621#issuecomment-1313087872

   thank you @HyukjinKwon @amaliujia @cloud-fan for the reviews


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on a diff in pull request #38621: [SPARK-41111][CONNECT][PYTHON] Implement `DataFrame.show`

Posted by GitBox <gi...@apache.org>.
amaliujia commented on code in PR #38621:
URL: https://github.com/apache/spark/pull/38621#discussion_r1020488570


##########
python/pyspark/sql/tests/connect/test_connect_basic.py:
##########
@@ -217,6 +217,12 @@ def test_empty_dataset(self):
     def test_session(self):
         self.assertEqual(self.connect, self.connect.sql("SELECT 1").sparkSession())
 
+    def test_show(self):
+        # SPARK-41111: Test the show method
+        show_str = self.connect.sql("SELECT 1 AS X, 2 AS Y")._show_string()
+        expected = "+---+---+\n|  X|  Y|\n+---+---+\n|  1|  2|\n+---+---+\n"

Review Comment:
   Does 
   r'''
   "+---+---+
   |      X|    Y|
   +---+---+
   |       1|   2|
   +---+---+
   '''
   
   Does this get rid of the one line with `\n` embedded that is a bit hard to read? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on a diff in pull request #38621: [SPARK-41111][CONNECT][PYTHON] Implement `DataFrame.show`

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on code in PR #38621:
URL: https://github.com/apache/spark/pull/38621#discussion_r1020992753


##########
python/pyspark/sql/tests/connect/test_connect_basic.py:
##########
@@ -217,6 +217,12 @@ def test_empty_dataset(self):
     def test_session(self):
         self.assertEqual(self.connect, self.connect.sql("SELECT 1").sparkSession())
 
+    def test_show(self):
+        # SPARK-41111: Test the show method
+        show_str = self.connect.sql("SELECT 1 AS X, 2 AS Y")._show_string()
+        expected = "+---+---+\n|  X|  Y|\n+---+---+\n|  1|  2|\n+---+---+\n"

Review Comment:
   I think we can make such change since it break the lint



##########
python/pyspark/sql/tests/connect/test_connect_basic.py:
##########
@@ -217,6 +217,12 @@ def test_empty_dataset(self):
     def test_session(self):
         self.assertEqual(self.connect, self.connect.sql("SELECT 1").sparkSession())
 
+    def test_show(self):
+        # SPARK-41111: Test the show method
+        show_str = self.connect.sql("SELECT 1 AS X, 2 AS Y")._show_string()
+        expected = "+---+---+\n|  X|  Y|\n+---+---+\n|  1|  2|\n+---+---+\n"

Review Comment:
   I think we cannot make such change since it break the lint



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on a diff in pull request #38621: [SPARK-41111][CONNECT][PYTHON] Implement `DataFrame.show`

Posted by GitBox <gi...@apache.org>.
amaliujia commented on code in PR #38621:
URL: https://github.com/apache/spark/pull/38621#discussion_r1021008154


##########
python/pyspark/sql/tests/connect/test_connect_basic.py:
##########
@@ -217,6 +217,12 @@ def test_empty_dataset(self):
     def test_session(self):
         self.assertEqual(self.connect, self.connect.sql("SELECT 1").sparkSession())
 
+    def test_show(self):
+        # SPARK-41111: Test the show method
+        show_str = self.connect.sql("SELECT 1 AS X, 2 AS Y")._show_string()
+        expected = "+---+---+\n|  X|  Y|\n+---+---+\n|  1|  2|\n+---+---+\n"

Review Comment:
   If this ever becomes a readability issue we can add a comment to have the readable format in the comment.
   
   No need to worry it now.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on a diff in pull request #38621: [SPARK-41111][CONNECT][PYTHON] Implement `DataFrame.show`

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on code in PR #38621:
URL: https://github.com/apache/spark/pull/38621#discussion_r1020987966


##########
python/pyspark/sql/tests/connect/test_connect_basic.py:
##########
@@ -217,6 +217,12 @@ def test_empty_dataset(self):
     def test_session(self):
         self.assertEqual(self.connect, self.connect.sql("SELECT 1").sparkSession())
 
+    def test_show(self):
+        # SPARK-41111: Test the show method
+        show_str = self.connect.sql("SELECT 1 AS X, 2 AS Y")._show_string()
+        expected = "+---+---+\n|  X|  Y|\n+---+---+\n|  1|  2|\n+---+---+\n"

Review Comment:
   let me have a try



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on pull request #38621: [SPARK-41111][CONNECT][PYTHON] Implement `DataFrame.show`

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on PR #38621:
URL: https://github.com/apache/spark/pull/38621#issuecomment-1313086271

   thanks, merging to master!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on a diff in pull request #38621: [SPARK-41111][CONNECT][PYTHON] Implement `DataFrame.show`

Posted by GitBox <gi...@apache.org>.
amaliujia commented on code in PR #38621:
URL: https://github.com/apache/spark/pull/38621#discussion_r1020483723


##########
connector/connect/src/main/protobuf/spark/connect/relations.proto:
##########
@@ -253,6 +254,23 @@ message Repartition {
   bool shuffle = 3;
 }
 
+// Compose the string representing rows for output.
+// It will invoke 'Dataset.showString' to compute the results.
+message ShowString {
+  // (Required). The input relation.
+  Relation input = 1;
+
+  // (Required) Number of rows to show.

Review Comment:
   I think we should have a proto message style guide ASAP that people agree.
   
   For example required field with `optional` notion is very confusing.
   
   IMO required field does not need `optional` to know set/unset. We say it is required which means clients should always offer a value. Server side just use the value ( only Messages are exceptions because if that is not set, it will give a NULL, which is problematic for JVM languages which is NPE).
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38621: [SPARK-41111][CONNECT][PYTHON] Implement `DataFrame.show`

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on code in PR #38621:
URL: https://github.com/apache/spark/pull/38621#discussion_r1020208301


##########
python/pyspark/sql/connect/dataframe.py:
##########
@@ -388,8 +388,55 @@ def sample(
             session=self._session,
         )
 
-    def show(self, n: int, truncate: Optional[Union[bool, int]], vertical: Optional[bool]) -> None:
-        ...
+    def _show_string(
+        self, n: int = 20, truncate: Union[bool, int] = True, vertical: bool = False
+    ) -> str:
+        if not isinstance(n, int) or isinstance(n, bool):
+            raise TypeError("Parameter 'n' (number of rows) must be an int")
+        if not isinstance(vertical, bool):
+            raise TypeError("Parameter 'vertical' must be a bool")
+
+        _truncate: int = -1
+        if isinstance(truncate, bool) and truncate:
+            _truncate = 20
+        else:
+            try:
+                _truncate = int(truncate)
+            except ValueError:
+                raise TypeError(
+                    "Parameter 'truncate={}' should be either bool or int.".format(truncate)
+                )
+
+        pdf = DataFrame.withPlan(
+            plan.ShowString(child=self._plan, numRows=n, truncate=_truncate, vertical=vertical),
+            session=self._session,
+        ).toPandas()
+        assert pdf is not None
+        return pdf.iloc[0, 0]
+
+    def show(self, n: int = 20, truncate: Union[bool, int] = True, vertical: bool = False) -> None:
+        """
+        Prints the first ``n`` rows to the console.
+
+        .. versionadded:: 3.4.0
+
+        Parameters
+        ----------
+        n : int, optional
+            Number of rows to show.
+        truncate : bool or int, optional
+            If set to ``True``, truncate strings longer than 20 chars by default.
+            If set to a number greater than one, truncates long strings to length ``truncate``
+            and align cells right.
+        vertical : bool, optional
+            If set to ``True``, print output rows vertically (one line
+            per column value).
+
+        Returns
+        -------
+        None

Review Comment:
   I think we can just remove
   ```suggestion
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on a diff in pull request #38621: [SPARK-41111][CONNECT][PYTHON] Implement `DataFrame.show`

Posted by GitBox <gi...@apache.org>.
amaliujia commented on code in PR #38621:
URL: https://github.com/apache/spark/pull/38621#discussion_r1020484857


##########
connector/connect/src/main/protobuf/spark/connect/relations.proto:
##########
@@ -253,6 +254,23 @@ message Repartition {
   bool shuffle = 3;
 }
 
+// Compose the string representing rows for output.
+// It will invoke 'Dataset.showString' to compute the results.
+message ShowString {
+  // (Required). The input relation.
+  Relation input = 1;
+
+  // (Required) Number of rows to show.

Review Comment:
   Or we say if a field is required, we use `optional` to offer the `hasXXX` method. This is also fine, but probably we need a consensus. I will refine what I had in https://github.com/apache/spark/pull/38605. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on a diff in pull request #38621: [SPARK-41111][CONNECT][PYTHON] Implement `DataFrame.show`

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on code in PR #38621:
URL: https://github.com/apache/spark/pull/38621#discussion_r1021008411


##########
python/pyspark/sql/tests/connect/test_connect_basic.py:
##########
@@ -217,6 +217,12 @@ def test_empty_dataset(self):
     def test_session(self):
         self.assertEqual(self.connect, self.connect.sql("SELECT 1").sparkSession())
 
+    def test_show(self):
+        # SPARK-41111: Test the show method
+        show_str = self.connect.sql("SELECT 1 AS X, 2 AS Y")._show_string()
+        expected = "+---+---+\n|  X|  Y|\n+---+---+\n|  1|  2|\n+---+---+\n"

Review Comment:
   good idea, will update



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on a diff in pull request #38621: [SPARK-41111][CONNECT][PYTHON] Implement `DataFrame.show`

Posted by GitBox <gi...@apache.org>.
amaliujia commented on code in PR #38621:
URL: https://github.com/apache/spark/pull/38621#discussion_r1021007865


##########
python/pyspark/sql/tests/connect/test_connect_basic.py:
##########
@@ -217,6 +217,12 @@ def test_empty_dataset(self):
     def test_session(self):
         self.assertEqual(self.connect, self.connect.sql("SELECT 1").sparkSession())
 
+    def test_show(self):
+        # SPARK-41111: Test the show method
+        show_str = self.connect.sql("SELECT 1 AS X, 2 AS Y")._show_string()
+        expected = "+---+---+\n|  X|  Y|\n+---+---+\n|  1|  2|\n+---+---+\n"

Review Comment:
   sounds good then!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #38621: [SPARK-41111][CONNECT][PYTHON] Implement `DataFrame.show`

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on code in PR #38621:
URL: https://github.com/apache/spark/pull/38621#discussion_r1021065768


##########
connector/connect/src/main/protobuf/spark/connect/relations.proto:
##########
@@ -253,6 +254,23 @@ message Repartition {
   bool shuffle = 3;
 }
 
+// Compose the string representing rows for output.
+// It will invoke 'Dataset.showString' to compute the results.
+message ShowString {
+  // (Required). The input relation.
+  Relation input = 1;
+
+  // (Required) Number of rows to show.

Review Comment:
   is there a best practice in the industry? making everything optional is also fine to me, so that the server side can detect more malformed proto messages.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38621: [SPARK-41111][CONNECT][PYTHON] Implement `DataFrame.show`

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on code in PR #38621:
URL: https://github.com/apache/spark/pull/38621#discussion_r1020209283


##########
connector/connect/src/main/protobuf/spark/connect/relations.proto:
##########
@@ -253,6 +254,23 @@ message Repartition {
   bool shuffle = 3;
 }
 
+// Compose the string representing rows for output.
+// It will invoke 'Dataset.showString' to compute the results.
+message ShowString {
+  // (Required). The input relation.
+  Relation input = 1;
+
+  // (Required) Number of rows to show.

Review Comment:
   Optional?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org