You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Apache Spark (Jira)" <ji...@apache.org> on 2022/04/02 00:16:00 UTC
[jira] [Assigned] (SPARK-38763) Pandas API on spark Can`t apply lamda to columns.
[ https://issues.apache.org/jira/browse/SPARK-38763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Apache Spark reassigned SPARK-38763:
------------------------------------
Assignee: Apache Spark
> Pandas API on spark Can`t apply lamda to columns.
> ---------------------------------------------------
>
> Key: SPARK-38763
> URL: https://issues.apache.org/jira/browse/SPARK-38763
> Project: Spark
> Issue Type: Bug
> Components: PySpark
> Affects Versions: 3.3.0, 3.4.0
> Reporter: Bjørn Jørgensen
> Assignee: Apache Spark
> Priority: Major
>
> When I use a spark master build from 08 November 21 I can use this code to rename columns
> {code:java}
> pf05 = pf05.rename(columns=lambda x: re.sub('DOFFIN_ESENDERS:', '', x))
> pf05 = pf05.rename(columns=lambda x: re.sub('FORM_SECTION:', '', x))
> pf05 = pf05.rename(columns=lambda x: re.sub('F05_2014:', '', x))
> {code}
> But now after I get this error when I use this code.
> ---------------------------------------------------------------------------
> ValueError Traceback (most recent call last)
> Input In [5], in <cell line: 1>()
> ----> 1 pf05 = pf05.rename(columns=lambda x: re.sub('DOFFIN_ESENDERS:', '', x))
> 2 pf05 = pf05.rename(columns=lambda x: re.sub('FORM_SECTION:', '', x))
> 3 pf05 = pf05.rename(columns=lambda x: re.sub('F05_2014:', '', x))
> File /opt/spark/python/pyspark/pandas/frame.py:10636, in DataFrame.rename(self, mapper, index, columns, axis, inplace, level, errors)
> 10632 index_mapper_fn, index_mapper_ret_dtype, index_mapper_ret_stype = gen_mapper_fn(
> 10633 index
> 10634 )
> 10635 if columns:
> > 10636 columns_mapper_fn, _, _ = gen_mapper_fn(columns)
> 10638 if not index and not columns:
> 10639 raise ValueError("Either `index` or `columns` should be provided.")
> File /opt/spark/python/pyspark/pandas/frame.py:10603, in DataFrame.rename.<locals>.gen_mapper_fn(mapper)
> 10601 elif callable(mapper):
> 10602 mapper_callable = cast(Callable, mapper)
> > 10603 return_type = cast(ScalarType, infer_return_type(mapper))
> 10604 dtype = return_type.dtype
> 10605 spark_return_type = return_type.spark_type
> File /opt/spark/python/pyspark/pandas/typedef/typehints.py:563, in infer_return_type(f)
> 560 tpe = get_type_hints(f).get("return", None)
> 562 if tpe is None:
> --> 563 raise ValueError("A return value is required for the input function")
> 565 if hasattr(tpe, "__origin__") and issubclass(tpe.__origin__, SeriesType):
> 566 tpe = tpe.__args__[0]
> ValueError: A return value is required for the input function
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org