You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Nikesh (Jira)" <ji...@apache.org> on 2022/11/04 22:25:00 UTC

[jira] [Created] (SPARK-41018) Koalas.idxmin() is not picking the minimum value from a dataframe, but pandas.idxmin() gives

Nikesh created SPARK-41018:
------------------------------

             Summary: Koalas.idxmin() is not picking the minimum value from a dataframe, but pandas.idxmin() gives
                 Key: SPARK-41018
                 URL: https://issues.apache.org/jira/browse/SPARK-41018
             Project: Spark
          Issue Type: Bug
          Components: PySpark
    Affects Versions: 3.3.1
         Environment: databricks
            Reporter: Nikesh
             Fix For: 3.3.1
         Attachments: ZScoreWithKoalas_PandasOnSpark_BiggerDataset.html, ZScoreWithKoalas_PandasOnSpark_SmallerDataset.html

Hi,
I have a koalas dataframe with age and income and I calculated Zscore on age and income and then norms is calculated using age_zscore and income_zscore(new column name is sq_dist). Then I tried to do an idxmin on the new column, but its not giving the minimum value.
I did the same operations on a Pandas dataframe, but it gives the minimum value .

Please find attached the notebook for step by step operations I performed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org