You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2019/05/21 04:33:32 UTC
[jira] [Resolved] (SPARK-9237) Added Top N Column Values for
DataFrames
[ https://issues.apache.org/jira/browse/SPARK-9237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon resolved SPARK-9237.
---------------------------------
Resolution: Incomplete
> Added Top N Column Values for DataFrames
> ----------------------------------------
>
> Key: SPARK-9237
> URL: https://issues.apache.org/jira/browse/SPARK-9237
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Reporter: Theodore michael Malaska
> Priority: Minor
> Labels: bulk-closed
>
> This jira is to add a very common data quality check into dataframes.
> A quick outline of this functionality can be seen in the following blog post
> http://blog.cloudera.com/blog/2015/07/how-to-do-data-quality-checks-using-apache-spark-dataframes/
> There are two parts to this Jira.
> 1. How to implement the Top N Count. Which I will start with the implementation in the blog
> 2. Where to add the function. Ether straight off Dataframe, in Dataframe describe or in DataFrameStatFunctions. I will start with putting it into DataFrameStatFunctions.
> Please let me know if you have any input.
> Thanks
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org