You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Yikun Jiang (Jira)" <ji...@apache.org> on 2022/05/27 08:33:00 UTC
[jira] [Commented] (SPARK-39317) groupby.apply doc test failed when SPARK_CONF_ARROW_ENABLED disable
[ https://issues.apache.org/jira/browse/SPARK-39317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17542830#comment-17542830 ]
Yikun Jiang commented on SPARK-39317:
-------------------------------------
Related:
https://github.com/pandas-dev/pandas/commit/af76bd5476a4715296fed8bfbd9c6c391edbfe3c
> groupby.apply doc test failed when SPARK_CONF_ARROW_ENABLED disable
> -------------------------------------------------------------------
>
> Key: SPARK-39317
> URL: https://issues.apache.org/jira/browse/SPARK-39317
> Project: Spark
> Issue Type: Sub-task
> Components: Pandas API on Spark
> Affects Versions: 3.4.0
> Reporter: Yikun Jiang
> Priority: Major
>
> {code:java}
> Traceback (most recent call last):
> File "/Users/yikun/venv/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 3361, in run_code
> exec(code_obj, self.user_global_ns, self.user_ns)
> File "<ipython-input-1-3cad40f83c12>", line 1, in <cell line: 1>
> g.apply(plus_min).sort_index()
> File "/Users/yikun/spark/python/pyspark/pandas/groupby.py", line 1444, in apply
> psser_or_psdf = ps.from_pandas(pser_or_pdf)
> File "/Users/yikun/spark/python/pyspark/pandas/namespace.py", line 153, in from_pandas
> return DataFrame(pobj)
> File "/Users/yikun/spark/python/pyspark/pandas/frame.py", line 457, in __init__
> internal = InternalFrame.from_pandas(pdf)
> File "/Users/yikun/spark/python/pyspark/pandas/internal.py", line 1473, in from_pandas
> sdf = default_session().createDataFrame(pdf, schema=schema)
> File "/Users/yikun/spark/python/pyspark/sql/session.py", line 961, in createDataFrame
> return super(SparkSession, self).createDataFrame( # type: ignore[call-overload]
> File "/Users/yikun/spark/python/pyspark/sql/pandas/conversion.py", line 437, in createDataFrame
> return self._create_dataframe(converted_data, schema, samplingRatio, verifySchema)
> File "/Users/yikun/spark/python/pyspark/sql/session.py", line 1006, in _create_dataframe
> rdd, struct = self._createFromLocal(map(prepare, data), schema)
> File "/Users/yikun/spark/python/pyspark/sql/session.py", line 698, in _createFromLocal
> data = list(data)
> File "/Users/yikun/spark/python/pyspark/sql/session.py", line 980, in prepare
> verify_func(obj)
> File "/Users/yikun/spark/python/pyspark/sql/types.py", line 1763, in verify
> verify_value(obj)
> File "/Users/yikun/spark/python/pyspark/sql/types.py", line 1741, in verify_struct
> verifier(v)
> File "/Users/yikun/spark/python/pyspark/sql/types.py", line 1763, in verify
> verify_value(obj)
> File "/Users/yikun/spark/python/pyspark/sql/types.py", line 1686, in verify_long
> verify_acceptable_types(obj)
> File "/Users/yikun/spark/python/pyspark/sql/types.py", line 1633, in verify_acceptable_types
> raise TypeError(
> TypeError: field B: LongType() can not accept object 2 in type <class 'numpy.int64'> {code}
--
This message was sent by Atlassian Jira
(v8.20.7#820007)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org