You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/12/30 10:54:40 UTC

[GitHub] [spark] zhengruifeng opened a new pull request, #39313: [SPARK-41745][SPARK-41789] Make `createDataFrame` support list of Rows

zhengruifeng opened a new pull request, #39313:
URL: https://github.com/apache/spark/pull/39313

   ### What changes were proposed in this pull request?
   Make `createDataFrame` support list of Rows
   
   
   ### Why are the changes needed?
   to be consistent with PySpark
   
   
   ### Does this PR introduce _any_ user-facing change?
   yes
   
   
   ### How was this patch tested?
   added UT
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #39313: [SPARK-41745][SPARK-41789] Make `createDataFrame` support list of Rows

Posted by GitBox <gi...@apache.org>.

HyukjinKwon commented on PR #39313:
URL: https://github.com/apache/spark/pull/39313#issuecomment-1368138619

   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] zhengruifeng commented on a diff in pull request #39313: [SPARK-41745][SPARK-41789] Make `createDataFrame` support list of Rows

Posted by GitBox <gi...@apache.org>.

zhengruifeng commented on code in PR #39313:
URL: https://github.com/apache/spark/pull/39313#discussion_r1059340299


##########
python/pyspark/sql/tests/connect/test_connect_basic.py:
##########
@@ -387,6 +388,30 @@ def test_with_local_list(self):
         with self.assertRaises(SparkConnectException):
             self.connect.createDataFrame(data, "col1 int, col2 int, col3 int").show()
 
+    def test_with_local_rows(self):
+        # SPARK-41789: Test creating a dataframe with list of Rows
+        data = [
+            Row(course="dotNET", year=2012, earnings=10000),
+            Row(course="Java", year=2012, earnings=20000),
+            Row(course="dotNET", year=2012, earnings=5000),
+            Row(course="dotNET", year=2013, earnings=48000),
+            Row(course="Java", year=2013, earnings=30000),
+            Row(course="Scala", year=2022, earnings=None),
+        ]
+
+        sdf = self.spark.createDataFrame(data)
+        cdf = self.connect.createDataFrame(data)
+
+        self.assertEqual(sdf.schema, cdf.schema)
+        self.assert_eq(sdf.toPandas(), cdf.toPandas())
+
+        # test with rename
+        sdf = self.spark.createDataFrame(data, schema=["a", "b", "c"])
+        cdf = self.connect.createDataFrame(data, schema=["a", "b", "c"])
+
+        self.assertEqual(sdf.schema, cdf.schema)
+        self.assert_eq(sdf.toPandas(), cdf.toPandas())
+

Review Comment:
   can not support nested rows for now:  https://issues.apache.org/jira/browse/SPARK-41746
   ```
   Traceback (most recent call last):
     File "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/tests/connect/test_connect_basic.py", line 427, in test_with_local_nested_rows
       cdf = self.connect.createDataFrame(data)
     File "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/connect/session.py", line 249, in createDataFrame
       table = pa.Table.from_pandas(pdf)
     File "pyarrow/table.pxi", line 3475, in pyarrow.lib.Table.from_pandas
     File "/Users/ruifeng.zheng/.dev/miniconda3/envs/spark_dev/lib/python3.9/site-packages/pyarrow/pandas_compat.py", line 611, in dataframe_to_arrays
       arrays = [convert_column(c, f)
     File "/Users/ruifeng.zheng/.dev/miniconda3/envs/spark_dev/lib/python3.9/site-packages/pyarrow/pandas_compat.py", line 611, in <listcomp>
       arrays = [convert_column(c, f)
     File "/Users/ruifeng.zheng/.dev/miniconda3/envs/spark_dev/lib/python3.9/site-packages/pyarrow/pandas_compat.py", line 598, in convert_column
       raise e
     File "/Users/ruifeng.zheng/.dev/miniconda3/envs/spark_dev/lib/python3.9/site-packages/pyarrow/pandas_compat.py", line 592, in convert_column
       result = pa.array(col, type=type_, from_pandas=True, safe=safe)
     File "pyarrow/array.pxi", line 316, in pyarrow.lib.array
     File "pyarrow/array.pxi", line 83, in pyarrow.lib._ndarray_to_array
     File "pyarrow/error.pxi", line 123, in pyarrow.lib.check_status
   pyarrow.lib.ArrowTypeError: ("Expected bytes, got a 'int' object", 'Conversion failed for column 1 with type object')
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HyukjinKwon closed pull request #39313: [SPARK-41745][SPARK-41789] Make `createDataFrame` support list of Rows

Posted by GitBox <gi...@apache.org>.

HyukjinKwon closed pull request #39313: [SPARK-41745][SPARK-41789] Make `createDataFrame` support list of Rows
URL: https://github.com/apache/spark/pull/39313


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org