You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by SparknewUser <me...@gmail.com> on 2015/07/30 11:33:18 UTC

How to perform basic statistics on a Json file to explore my numeric and non-numeric variables?

I've imported a Json file which has this schema :
    
    sqlContext.read.json("filename").printSchema
       root
     |-- COL: long (nullable = true)
     |-- DATA: array (nullable = true)
     |    |-- element: struct (containsNull = true)
     |    |    |-- Crate: string (nullable = true)
     |    |    |-- MLrate: string (nullable = true)
     |    |    |-- Nrout: string (nullable = true)
     |    |    |-- up: string (nullable = true)
     |-- IFAM: string (nullable = true)
     |-- KTM: long (nullable = true)


I'm new on Spark and I want to perform basic statistics like 
  * getting the min, max, mean, median and std of numeric variables 
  * getting the values frequencies for non-numeric variables.

My questions are : 
- How to change the type of my variables in my schema, from 'string' to
'numeric' ? (Crate, MLrate and Nrout should be numeric variables) ?
- How to do those basic statistics easily ?







--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-perform-basic-statistics-on-a-Json-file-to-explore-my-numeric-and-non-numeric-variables-tp24077.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org