You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/12/17 03:19:21 UTC

[GitHub] [spark] beliefer opened a new pull request, #39103: [SPARK-41546][CONNECT][PYTHON] `pyspark_types_to_proto_types` should support StructType.

beliefer opened a new pull request, #39103:
URL: https://github.com/apache/spark/pull/39103

   ### What changes were proposed in this pull request?
   Currently, `pyspark_types_to_proto_types` used to transform pyspark datatypes to protobuffer datatypes.
   But it only supports the primary data type, no struct type.
   Many connect API need to transform pyspark struct type to protobuffer struct type. For example, `createDataFrame`, `DataFrame.to` and so on.
   
   
   ### Why are the changes needed?
   This PR let `pyspark_types_to_proto_types` support `StructType`.
   
   
   ### Does this PR introduce _any_ user-facing change?
   'No'.
   New feature.
   
   
   ### How was this patch tested?
   New tests.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on pull request #39103: [SPARK-41546][CONNECT][PYTHON] `pyspark_types_to_proto_types` should support StructType.

Posted by GitBox <gi...@apache.org>.
beliefer commented on PR #39103:
URL: https://github.com/apache/spark/pull/39103#issuecomment-1356121399

   ping @HyukjinKwon @zhengruifeng @amaliujia 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] grundprinzip commented on a diff in pull request #39103: [SPARK-41546][CONNECT][PYTHON] `pyspark_types_to_proto_types` should support StructType.

Posted by GitBox <gi...@apache.org>.
grundprinzip commented on code in PR #39103:
URL: https://github.com/apache/spark/pull/39103#discussion_r1051537745


##########
python/pyspark/sql/connect/types.py:
##########
@@ -75,6 +75,15 @@ def pyspark_types_to_proto_types(data_type: DataType) -> pb2.DataType:
     elif isinstance(data_type, DayTimeIntervalType):
         ret.day_time_interval.start_field = data_type.startField
         ret.day_time_interval.end_field = data_type.endField
+    elif isinstance(data_type, StructType):
+        fields = []
+        for field in data_type.fields:
+            struct_field = pb2.DataType.StructField()
+            struct_field.name = field.name
+            struct_field.data_type.CopyFrom(pyspark_types_to_proto_types(field.dataType))
+            struct_field.nullable = field.nullable
+            fields.append(struct_field)
+        ret.struct.fields.extend(fields)

Review Comment:
   ```suggestion
                ret.struct.fields.append(struct_field)
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on pull request #39103: [SPARK-41546][CONNECT][PYTHON] `pyspark_types_to_proto_types` should support StructType.

Posted by GitBox <gi...@apache.org>.
beliefer commented on PR #39103:
URL: https://github.com/apache/spark/pull/39103#issuecomment-1356970142

   @grundprinzip @HyukjinKwon @zhengruifeng Thank you for all !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] grundprinzip commented on a diff in pull request #39103: [SPARK-41546][CONNECT][PYTHON] `pyspark_types_to_proto_types` should support StructType.

Posted by GitBox <gi...@apache.org>.
grundprinzip commented on code in PR #39103:
URL: https://github.com/apache/spark/pull/39103#discussion_r1051537745


##########
python/pyspark/sql/connect/types.py:
##########
@@ -75,6 +75,15 @@ def pyspark_types_to_proto_types(data_type: DataType) -> pb2.DataType:
     elif isinstance(data_type, DayTimeIntervalType):
         ret.day_time_interval.start_field = data_type.startField
         ret.day_time_interval.end_field = data_type.endField
+    elif isinstance(data_type, StructType):
+        fields = []
+        for field in data_type.fields:
+            struct_field = pb2.DataType.StructField()
+            struct_field.name = field.name
+            struct_field.data_type.CopyFrom(pyspark_types_to_proto_types(field.dataType))
+            struct_field.nullable = field.nullable
+            fields.append(struct_field)
+        ret.struct.fields.extend(fields)

Review Comment:
   This does not need the extra copy, you can directly append to the return type array.
   
   ```suggestion
                ret.struct.fields.append(struct_field)
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon closed pull request #39103: [SPARK-41546][CONNECT][PYTHON] `pyspark_types_to_proto_types` should support StructType.

Posted by GitBox <gi...@apache.org>.
HyukjinKwon closed pull request #39103: [SPARK-41546][CONNECT][PYTHON] `pyspark_types_to_proto_types` should support StructType.
URL: https://github.com/apache/spark/pull/39103


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #39103: [SPARK-41546][CONNECT][PYTHON] `pyspark_types_to_proto_types` should support StructType.

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on PR #39103:
URL: https://github.com/apache/spark/pull/39103#issuecomment-1356943639

   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on pull request #39103: [SPARK-41546][CONNECT][PYTHON] `pyspark_types_to_proto_types` should support StructType.

Posted by GitBox <gi...@apache.org>.
amaliujia commented on PR #39103:
URL: https://github.com/apache/spark/pull/39103#issuecomment-1358029945

   late LGTM!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org