You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Hyukjin Kwon (Jira)" <ji...@apache.org> on 2021/10/07 09:37:00 UTC

[jira] [Resolved] (SPARK-36707) Support to specify index type and name in pandas API on Spark

     [ https://issues.apache.org/jira/browse/SPARK-36707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hyukjin Kwon resolved SPARK-36707.
----------------------------------
    Fix Version/s: 3.3.0
         Assignee: Hyukjin Kwon
       Resolution: Done

> Support to specify index type and name in pandas API on Spark
> -------------------------------------------------------------
>
>                 Key: SPARK-36707
>                 URL: https://issues.apache.org/jira/browse/SPARK-36707
>             Project: Spark
>          Issue Type: Umbrella
>          Components: PySpark
>    Affects Versions: 3.3.0
>            Reporter: Hyukjin Kwon
>            Assignee: Hyukjin Kwon
>            Priority: Major
>             Fix For: 3.3.0
>
>
> See https://koalas.readthedocs.io/en/latest/user_guide/typehints.html.
> pandas API on Spark currently there's no way to specify the index type and name in the output when you apply an arbitrary function, which forces to create the default index:
> {code}
> >>> def transform(pdf) -> pd.DataFrame["id": int, "A": int]:
> ...     pdf['A'] = pdf.id + 1
> ...     return pdf
> ...
> >>> ps.range(5).koalas.apply_batch(transform)
> {code}
> {code}
>    id   A
> 0   0   1
> 1   1   2
> 2   2   3
> 3   3   4
> 4   4   5
> {code}
> We should have a way to specify the index.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org