You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (Jira)" <ji...@apache.org> on 2021/09/10 06:35:00 UTC
[jira] [Created] (SPARK-36707) Support to specify index type and
name in pandas API on Spark
Hyukjin Kwon created SPARK-36707:
------------------------------------
Summary: Support to specify index type and name in pandas API on Spark
Key: SPARK-36707
URL: https://issues.apache.org/jira/browse/SPARK-36707
Project: Spark
Issue Type: Umbrella
Components: PySpark
Affects Versions: 3.3.0
Reporter: Hyukjin Kwon
See https://koalas.readthedocs.io/en/latest/user_guide/typehints.html.
pandas API on Spark currently there's no way to specify the index type and name in the output when you apply an arbitrary function, which forces to create the default index:
{code}
>>> def transform(pdf) -> pd.DataFrame["id": int, "A": int]:
... pdf['A'] = pdf.id + 1
... return pdf
...
>>> ps.range(5).koalas.apply_batch(transform)
{code}
{code}
id A
0 0 1
1 1 2
2 2 3
3 3 4
4 4 5
{code}
We should have a way to specify the index.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org