You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Florian M <fl...@gmail.com> on 2015/08/20 12:10:11 UTC

[SparkR] How to perform a for loop on a DataFrame object

Hi guys, 

First of all, thank you for your amazing work.

As you can see in the subject, I post here because I need to perform a for
loop on a DataFrame object. 

Sample of my Dataset (the entire dataset is ~400k lines long) : 

I use the 1.4.1 Spark version with R in 3.2.1

I launch sparkR using (the package can be found at
http://spark-packages.org/package/databricks/spark-csv )



I load my dataset from HDFS using the following command (the package is
needed to load a CSV in a Spark DataFrame): 



When I do a summary, the output is : 


What I need to do is to calculate :


But you probably know that we can't do this because the read.df function
return an S4 object and it is not an iterable object.

Does anyone know how can I do that ? 
Maybe I have to convert the type of the DataFrame or use another function to
load my dataset...
I have to say that I'm new to Spark and SparkR :)

Thanks for your time,

Florian




--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/SparkR-How-to-perform-a-for-loop-on-a-DataFrame-object-tp24359.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org