You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Bene <be...@outlook.com> on 2016/09/10 08:44:31 UTC
SparkR API problem with subsetting distributed data frame
Hi,
I am having a problem with the SparkR API. I need to subset a distributed
data so I can extract single values from it on which I can then do
calculations.
Each row of my df has two integer values, I am creating a vector of new
values calculated as a series of sin, cos, tan functions on these two
values. Does anyone have an idea how to do this in SparkR?
So far I tried subsetting with [], [[]], subset(), but mostly I get the
error
object of type 'S4' is not subsettable
Is there any way to do such a thing in SparkR? Any help would be greatly
appreciated! Also let me know if you need more information, code etc.
Thanks!
--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/SparkR-API-problem-with-subsetting-distributed-data-frame-tp27688.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org
Re: SparkR API problem with subsetting distributed data frame
Posted by Felix Cheung <fe...@hotmail.com>.
How are you calling dirs()? What would be x? Is dat a SparkDataFrame?
With SparkR, i in dat[i, 4] should be an logical expression for row, eg. df[df$age %in% c(19, 30), 1:2]
On Sat, Sep 10, 2016 at 11:02 AM -0700, "Bene" <be...@outlook.com>> wrote:
Here are a few code snippets:
The data frame looks like this:
kfz zeit datum
latitude longitude
1 ######### 2015-02-09 07:18:33 2015-02-09 52.35234 9.881965
2 ######### 2015-02-09 07:18:34 2015-02-09 52.35233 9.881970
3 ######### 2015-02-09 07:18:35 2015-02-09 52.35232 9.881975
4 ######### 2015-02-09 07:18:36 2015-02-09 52.35232 9.881972
5 ######### 2015-02-09 07:18:37 2015-02-09 52.35231 9.881973
6 ######### 2015-02-09 07:18:38 2015-02-09 52.35231 9.881978
I call this function with a number (position in the data frame) and a data
frame:
dirs <- function(x, dat){
direction(startLat = dat[x,4], endLat = dat[x+1,4], startLon = dat[x,5],
endLon = dat[x+1,5])
}
Here I get the error with the S4 class not subsettable. This function calls
another function which does the actual calculation:
direction <- function(startLat, endLat, startLon, endLon){
startLat <- degrees.to.radians(startLat);
startLon <- degrees.to.radians(startLon);
endLat <- degrees.to.radians(endLat);
endLon <- degrees.to.radians(endLon);
dLon <- endLon - startLon;
dPhi <- log(tan(endLat / 2 + pi / 4) / tan(startLat / 2 + pi / 4));
if (abs(dLon) > pi) {
if (dLon > 0) {
dLon <- -(2 * pi - dLon);
} else {
dLon <- (2 * pi + dLon);
}
}
bearing <- radians.to.degrees((atan2(dLon, dPhi) + 360 )) %% 360;
return (bearing);
}
Anything more you need?
--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/SparkR-API-problem-with-subsetting-distributed-data-frame-tp27688p27691.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org
Re: SparkR API problem with subsetting distributed data frame
Posted by Bene <be...@outlook.com>.
I am calling dirs(x, dat) with a number for x and a distributed dataframe for
dat, like dirs(3, df).
With your logical expression Felix I would get another data frame, right?
This is not what I need, I need to extract a single value in a specific cell
for my calculations. Is that somehow possible?
--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/SparkR-API-problem-with-subsetting-distributed-data-frame-tp27688p27692.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org
Re: SparkR API problem with subsetting distributed data frame
Posted by Bene <be...@outlook.com>.
Here are a few code snippets:
The data frame looks like this:
kfz zeit datum
latitude longitude
1 ######### 2015-02-09 07:18:33 2015-02-09 52.35234 9.881965
2 ######### 2015-02-09 07:18:34 2015-02-09 52.35233 9.881970
3 ######### 2015-02-09 07:18:35 2015-02-09 52.35232 9.881975
4 ######### 2015-02-09 07:18:36 2015-02-09 52.35232 9.881972
5 ######### 2015-02-09 07:18:37 2015-02-09 52.35231 9.881973
6 ######### 2015-02-09 07:18:38 2015-02-09 52.35231 9.881978
I call this function with a number (position in the data frame) and a data
frame:
dirs <- function(x, dat){
direction(startLat = dat[x,4], endLat = dat[x+1,4], startLon = dat[x,5],
endLon = dat[x+1,5])
}
Here I get the error with the S4 class not subsettable. This function calls
another function which does the actual calculation:
direction <- function(startLat, endLat, startLon, endLon){
startLat <- degrees.to.radians(startLat);
startLon <- degrees.to.radians(startLon);
endLat <- degrees.to.radians(endLat);
endLon <- degrees.to.radians(endLon);
dLon <- endLon - startLon;
dPhi <- log(tan(endLat / 2 + pi / 4) / tan(startLat / 2 + pi / 4));
if (abs(dLon) > pi) {
if (dLon > 0) {
dLon <- -(2 * pi - dLon);
} else {
dLon <- (2 * pi + dLon);
}
}
bearing <- radians.to.degrees((atan2(dLon, dPhi) + 360 )) %% 360;
return (bearing);
}
Anything more you need?
--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/SparkR-API-problem-with-subsetting-distributed-data-frame-tp27688p27691.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org
Re: SparkR API problem with subsetting distributed data frame
Posted by Felix Cheung <fe...@hotmail.com>.
Could you include code snippets you are running?
On Sat, Sep 10, 2016 at 1:44 AM -0700, "Bene" <be...@outlook.com>> wrote:
Hi,
I am having a problem with the SparkR API. I need to subset a distributed
data so I can extract single values from it on which I can then do
calculations.
Each row of my df has two integer values, I am creating a vector of new
values calculated as a series of sin, cos, tan functions on these two
values. Does anyone have an idea how to do this in SparkR?
So far I tried subsetting with [], [[]], subset(), but mostly I get the
error
object of type 'S4' is not subsettable
Is there any way to do such a thing in SparkR? Any help would be greatly
appreciated! Also let me know if you need more information, code etc.
Thanks!
--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/SparkR-API-problem-with-subsetting-distributed-data-frame-tp27688.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org