You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by SparknewUser <me...@gmail.com> on 2015/07/30 11:33:18 UTC
How to perform basic statistics on a Json file to explore my
numeric and non-numeric variables?
I've imported a Json file which has this schema :
sqlContext.read.json("filename").printSchema
root
|-- COL: long (nullable = true)
|-- DATA: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- Crate: string (nullable = true)
| | |-- MLrate: string (nullable = true)
| | |-- Nrout: string (nullable = true)
| | |-- up: string (nullable = true)
|-- IFAM: string (nullable = true)
|-- KTM: long (nullable = true)
I'm new on Spark and I want to perform basic statistics like
* getting the min, max, mean, median and std of numeric variables
* getting the values frequencies for non-numeric variables.
My questions are :
- How to change the type of my variables in my schema, from 'string' to
'numeric' ? (Crate, MLrate and Nrout should be numeric variables) ?
- How to do those basic statistics easily ?
--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-perform-basic-statistics-on-a-Json-file-to-explore-my-numeric-and-non-numeric-variables-tp24077.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org