You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Jacob Duenke (Jira)" <ji...@apache.org> on 2021/10/21 23:07:00 UTC

[jira] [Commented] (SPARK-32423) class 'DataFrame' returns instance of type(self) instead of DataFrame

    [ https://issues.apache.org/jira/browse/SPARK-32423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17432744#comment-17432744 ] 

Jacob Duenke commented on SPARK-32423:
--------------------------------------

I'm looking for this same thing a year later. I beleive what OP was referring to is the DataFrame class. Some 40 or so methods return a new DataFrame object like below. https://github.com/apache/spark/blob/dc607911a91c515f23d8192f389e7e54e785f94d/python/pyspark/sql/dataframe.py#L761

```python
def limit(self, num: int) -> "DataFrame":
    """Limits the result count to the number specified.
    .. versionadded:: 1.3.0
    Examples
    --------
    >>> df.limit(1).collect()
    [Row(age=2, name='Alice')]
    >>> df.limit(0).collect()
    []
    """
    jdf = self._jdf.limit(num)
    return DataFrame(jdf, self.sql_ctx)
```
 
If these methods returned `type(self)`, it would be easier to extend the class DataFrame for our own uses. Otherwise, we are forced to re-copy all these 40 or so methods to ensure they return our extended class, "MyDataFrameClass".

> class 'DataFrame' returns instance of type(self) instead of DataFrame 
> ----------------------------------------------------------------------
>
>                 Key: SPARK-32423
>                 URL: https://issues.apache.org/jira/browse/SPARK-32423
>             Project: Spark
>          Issue Type: Wish
>          Components: PySpark
>    Affects Versions: 2.4.6, 3.0.0
>            Reporter: Timothy
>            Priority: Minor
>
> To allow for appropriate child classing of DataFrame, I propose the following change:
> class 'DataFrame' returns instance of type(self) instead of  typeDataFrame 
>  
> Therefore child classes using methods such as '.limit()' will return an instance of the child class.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org