You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Jacob Duenke (Jira)" <ji...@apache.org> on 2021/10/21 23:07:00 UTC
[jira] [Commented] (SPARK-32423) class 'DataFrame' returns instance
of type(self) instead of DataFrame
[ https://issues.apache.org/jira/browse/SPARK-32423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17432744#comment-17432744 ]
Jacob Duenke commented on SPARK-32423:
--------------------------------------
I'm looking for this same thing a year later. I beleive what OP was referring to is the DataFrame class. Some 40 or so methods return a new DataFrame object like below. https://github.com/apache/spark/blob/dc607911a91c515f23d8192f389e7e54e785f94d/python/pyspark/sql/dataframe.py#L761
```python
def limit(self, num: int) -> "DataFrame":
"""Limits the result count to the number specified.
.. versionadded:: 1.3.0
Examples
--------
>>> df.limit(1).collect()
[Row(age=2, name='Alice')]
>>> df.limit(0).collect()
[]
"""
jdf = self._jdf.limit(num)
return DataFrame(jdf, self.sql_ctx)
```
If these methods returned `type(self)`, it would be easier to extend the class DataFrame for our own uses. Otherwise, we are forced to re-copy all these 40 or so methods to ensure they return our extended class, "MyDataFrameClass".
> class 'DataFrame' returns instance of type(self) instead of DataFrame
> ----------------------------------------------------------------------
>
> Key: SPARK-32423
> URL: https://issues.apache.org/jira/browse/SPARK-32423
> Project: Spark
> Issue Type: Wish
> Components: PySpark
> Affects Versions: 2.4.6, 3.0.0
> Reporter: Timothy
> Priority: Minor
>
> To allow for appropriate child classing of DataFrame, I propose the following change:
> class 'DataFrame' returns instance of type(self) instead of typeDataFrame
>
> Therefore child classes using methods such as '.limit()' will return an instance of the child class.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org