You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Nikesh (Jira)" <ji...@apache.org> on 2022/11/04 22:25:00 UTC
[jira] [Created] (SPARK-41018) Koalas.idxmin() is not picking the minimum value from a dataframe, but pandas.idxmin() gives
Nikesh created SPARK-41018:
------------------------------
Summary: Koalas.idxmin() is not picking the minimum value from a dataframe, but pandas.idxmin() gives
Key: SPARK-41018
URL: https://issues.apache.org/jira/browse/SPARK-41018
Project: Spark
Issue Type: Bug
Components: PySpark
Affects Versions: 3.3.1
Environment: databricks
Reporter: Nikesh
Fix For: 3.3.1
Attachments: ZScoreWithKoalas_PandasOnSpark_BiggerDataset.html, ZScoreWithKoalas_PandasOnSpark_SmallerDataset.html
Hi,
I have a koalas dataframe with age and income and I calculated Zscore on age and income and then norms is calculated using age_zscore and income_zscore(new column name is sq_dist). Then I tried to do an idxmin on the new column, but its not giving the minimum value.
I did the same operations on a Pandas dataframe, but it gives the minimum value .
Please find attached the notebook for step by step operations I performed.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org