You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Hyukjin Kwon (Jira)" <ji...@apache.org> on 2021/09/10 06:35:00 UTC

[jira] [Created] (SPARK-36707) Support to specify index type and name in pandas API on Spark

Hyukjin Kwon created SPARK-36707:
------------------------------------

             Summary: Support to specify index type and name in pandas API on Spark
                 Key: SPARK-36707
                 URL: https://issues.apache.org/jira/browse/SPARK-36707
             Project: Spark
          Issue Type: Umbrella
          Components: PySpark
    Affects Versions: 3.3.0
            Reporter: Hyukjin Kwon


See https://koalas.readthedocs.io/en/latest/user_guide/typehints.html.

pandas API on Spark currently there's no way to specify the index type and name in the output when you apply an arbitrary function, which forces to create the default index:

{code}
>>> def transform(pdf) -> pd.DataFrame["id": int, "A": int]:
...     pdf['A'] = pdf.id + 1
...     return pdf
...
>>> ps.range(5).koalas.apply_batch(transform)
{code}

{code}
   id   A
0   0   1
1   1   2
2   2   3
3   3   4
4   4   5
{code}

We should have a way to specify the index.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org