You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2019/05/19 03:58:00 UTC

[jira] [Commented] (SPARK-27756) Add a shape property to DataFrame in pyspark

    [ https://issues.apache.org/jira/browse/SPARK-27756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16843298#comment-16843298 ] 

Hyukjin Kwon commented on SPARK-27756:
--------------------------------------

As you described, it's easy to get it via one line. PySpark doesn't necessarily add it only because pandas has it. 

> Add a shape property to DataFrame in pyspark
> --------------------------------------------
>
>                 Key: SPARK-27756
>                 URL: https://issues.apache.org/jira/browse/SPARK-27756
>             Project: Spark
>          Issue Type: Wish
>          Components: PySpark
>    Affects Versions: 2.4.3
>            Reporter: Louis Yang
>            Priority: Minor
>
> It will be great if PySpark DataFrame can support simple shape attribute which returns the number of rows and columns similar to what [pandas|[https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.shape.html]] has.
> We can add the following to the DataFrame class
>  
> {code:java}
> @property
> def shape(self):
>     return (self.count(), len(self.columns)){code}
> Then user in python can simply do
>  
> {code:java}
> >>> df.shape
> (10000, 20){code}
> to know the most fundamental information of a dataframe when working interactively.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org