You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2019/05/21 04:33:32 UTC

[jira] [Resolved] (SPARK-9237) Added Top N Column Values for DataFrames

     [ https://issues.apache.org/jira/browse/SPARK-9237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hyukjin Kwon resolved SPARK-9237.
---------------------------------
    Resolution: Incomplete

> Added Top N Column Values for DataFrames
> ----------------------------------------
>
>                 Key: SPARK-9237
>                 URL: https://issues.apache.org/jira/browse/SPARK-9237
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>            Reporter: Theodore michael Malaska
>            Priority: Minor
>              Labels: bulk-closed
>
> This jira is to add a very common data quality check into dataframes.
> A quick outline of this functionality can be seen in the following blog post
> http://blog.cloudera.com/blog/2015/07/how-to-do-data-quality-checks-using-apache-spark-dataframes/
> There are two parts to this Jira.
> 1. How to implement the Top N Count.  Which I will start with the implementation in the blog
> 2. Where to add the function.  Ether straight off Dataframe, in Dataframe describe or in DataFrameStatFunctions.  I will start with putting it into DataFrameStatFunctions.
> Please let me know if you have any input.
> Thanks



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org