You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by gu...@apache.org on 2022/03/21 12:03:49 UTC
[spark] branch master updated: [SPARK-38612][PYTHON] Fix Inline type hint for duplicated.keep
This is an automated email from the ASF dual-hosted git repository.
gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new acb50d9 [SPARK-38612][PYTHON] Fix Inline type hint for duplicated.keep
acb50d9 is described below
commit acb50d95a4952dea1cbbc27d4ddcc0b3432a13cf
Author: Yikun Jiang <yi...@gmail.com>
AuthorDate: Mon Mar 21 21:02:39 2022 +0900
[SPARK-38612][PYTHON] Fix Inline type hint for duplicated.keep
### What changes were proposed in this pull request?
Fix Inline type hint for `duplicated.keep`
### Why are the changes needed?
`keep` can be "first", "last" and False in pandas
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
CI passed
Closes #35920 from Yikun/SPARK-38612.
Authored-by: Yikun Jiang <yi...@gmail.com>
Signed-off-by: Hyukjin Kwon <gu...@apache.org>
---
python/pyspark/pandas/frame.py | 6 +++---
python/pyspark/pandas/series.py | 4 +++-
2 files changed, 6 insertions(+), 4 deletions(-)
diff --git a/python/pyspark/pandas/frame.py b/python/pyspark/pandas/frame.py
index 41a0dde..b355708 100644
--- a/python/pyspark/pandas/frame.py
+++ b/python/pyspark/pandas/frame.py
@@ -4307,7 +4307,7 @@ defaultdict(<class 'list'>, {'col..., 'col...})]
def _mark_duplicates(
self,
subset: Optional[Union[Name, List[Name]]] = None,
- keep: str = "first",
+ keep: Union[bool, str] = "first",
) -> Tuple[SparkDataFrame, str]:
if subset is None:
subset_list = self._internal.column_labels
@@ -4350,7 +4350,7 @@ defaultdict(<class 'list'>, {'col..., 'col...})]
def duplicated(
self,
subset: Optional[Union[Name, List[Name]]] = None,
- keep: str = "first",
+ keep: Union[bool, str] = "first",
) -> "Series":
"""
Return boolean Series denoting duplicate rows, optionally only considering certain columns.
@@ -9037,7 +9037,7 @@ defaultdict(<class 'list'>, {'col..., 'col...})]
def drop_duplicates(
self,
subset: Optional[Union[Name, List[Name]]] = None,
- keep: str = "first",
+ keep: Union[bool, str] = "first",
inplace: bool = False,
) -> Optional["DataFrame"]:
"""
diff --git a/python/pyspark/pandas/series.py b/python/pyspark/pandas/series.py
index 038f78f..cae0838 100644
--- a/python/pyspark/pandas/series.py
+++ b/python/pyspark/pandas/series.py
@@ -1647,7 +1647,9 @@ class Series(Frame, IndexOpsMixin, Generic[T]):
tolist = to_list
- def drop_duplicates(self, keep: str = "first", inplace: bool = False) -> Optional["Series"]:
+ def drop_duplicates(
+ self, keep: Union[bool, str] = "first", inplace: bool = False
+ ) -> Optional["Series"]:
"""
Return Series with duplicate values removed.
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org