You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Sean Owen (Jira)" <ji...@apache.org> on 2019/09/16 18:35:00 UTC

[jira] [Resolved] (SPARK-24671) DataFrame length using a dunder/magic method in PySpark

     [ https://issues.apache.org/jira/browse/SPARK-24671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Owen resolved SPARK-24671.
-------------------------------
    Resolution: Won't Fix

> DataFrame length using a dunder/magic method in PySpark
> -------------------------------------------------------
>
>                 Key: SPARK-24671
>                 URL: https://issues.apache.org/jira/browse/SPARK-24671
>             Project: Spark
>          Issue Type: Improvement
>          Components: PySpark
>    Affects Versions: 2.3.1
>            Reporter: Ondrej Kokes
>            Priority: Minor
>
> In Python, if a class implements a method called __len__, one can use the builtin `len` function to get a length of an instance of said class, whatever that means in its context. This is e.g. how you get the number of rows of a pandas DataFrame.
> It should be straightforward to add this functionality to PySpark, because df.count() is already implemented, so the patch I'm proposing is just two lines of code (and two lines of tests). It's in this commit, I'll submit a PR shortly.
> https://github.com/kokes/spark/commit/4d0afaf3cd046b11e8bae43dc00ddf4b1eb97732



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org