You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Bene <be...@outlook.com> on 2016/09/10 08:44:31 UTC

SparkR API problem with subsetting distributed data frame

Hi,

I am having a problem with the SparkR API. I need to subset a distributed
data so I can extract single values from it on which I can then do
calculations.

Each row of my df has two integer values, I am creating a vector of new
values calculated as a series of sin, cos, tan functions on these two
values. Does anyone have an idea how to do this in SparkR?

So far I tried subsetting with [], [[]], subset(), but mostly I get the
error

object of type 'S4' is not subsettable 

Is there any way to do such a thing in SparkR? Any help would be greatly
appreciated! Also let me know if you need more information, code etc.
Thanks!



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/SparkR-API-problem-with-subsetting-distributed-data-frame-tp27688.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Re: SparkR API problem with subsetting distributed data frame

Posted by Felix Cheung <fe...@hotmail.com>.
How are you calling dirs()? What would be x? Is dat a SparkDataFrame?

With SparkR, i in dat[i, 4] should be an logical expression for row, eg. df[df$age %in% c(19, 30), 1:2]





On Sat, Sep 10, 2016 at 11:02 AM -0700, "Bene" <be...@outlook.com>> wrote:

Here are a few code snippets:

The data frame looks like this:

    kfz                zeit                   datum
latitude     longitude
1 ######### 2015-02-09 07:18:33 2015-02-09 52.35234  9.881965
2 ######### 2015-02-09 07:18:34 2015-02-09 52.35233  9.881970
3 ######### 2015-02-09 07:18:35 2015-02-09 52.35232  9.881975
4 ######### 2015-02-09 07:18:36 2015-02-09 52.35232  9.881972
5 ######### 2015-02-09 07:18:37 2015-02-09 52.35231  9.881973
6 ######### 2015-02-09 07:18:38 2015-02-09 52.35231  9.881978

I call this function with a number (position in the data frame) and a data
frame:

dirs <- function(x, dat){
  direction(startLat = dat[x,4], endLat = dat[x+1,4], startLon = dat[x,5],
endLon = dat[x+1,5])
}

Here I get the error with the S4 class not subsettable. This function calls
another function which does the actual calculation:

direction <- function(startLat, endLat, startLon, endLon){
  startLat <- degrees.to.radians(startLat);
  startLon <- degrees.to.radians(startLon);
  endLat <- degrees.to.radians(endLat);
  endLon <- degrees.to.radians(endLon);
  dLon <- endLon - startLon;

  dPhi <- log(tan(endLat / 2 + pi / 4) / tan(startLat / 2 + pi / 4));
  if (abs(dLon) > pi) {
    if (dLon > 0) {
      dLon <- -(2 * pi - dLon);
    } else {
      dLon <- (2 * pi + dLon);
    }
  }
  bearing <- radians.to.degrees((atan2(dLon, dPhi) + 360 )) %% 360;
  return (bearing);
}


Anything more you need?



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/SparkR-API-problem-with-subsetting-distributed-data-frame-tp27688p27691.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Re: SparkR API problem with subsetting distributed data frame

Posted by Bene <be...@outlook.com>.
I am calling dirs(x, dat) with a number for x and a distributed dataframe for
dat, like dirs(3, df).
With your logical expression Felix I would get another data frame, right?
This is not what I need, I need to extract a single value in a specific cell
for my calculations. Is that somehow possible?



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/SparkR-API-problem-with-subsetting-distributed-data-frame-tp27688p27692.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Re: SparkR API problem with subsetting distributed data frame

Posted by Bene <be...@outlook.com>.
Here are a few code snippets:

The data frame looks like this:

    kfz                zeit                   datum                 
latitude     longitude
1 ######### 2015-02-09 07:18:33 2015-02-09 52.35234  9.881965
2 ######### 2015-02-09 07:18:34 2015-02-09 52.35233  9.881970
3 ######### 2015-02-09 07:18:35 2015-02-09 52.35232  9.881975
4 ######### 2015-02-09 07:18:36 2015-02-09 52.35232  9.881972
5 ######### 2015-02-09 07:18:37 2015-02-09 52.35231  9.881973
6 ######### 2015-02-09 07:18:38 2015-02-09 52.35231  9.881978

I call this function with a number (position in the data frame) and a data
frame:

dirs <- function(x, dat){
  direction(startLat = dat[x,4], endLat = dat[x+1,4], startLon = dat[x,5],
endLon = dat[x+1,5])
}

Here I get the error with the S4 class not subsettable. This function calls
another function which does the actual calculation:

direction <- function(startLat, endLat, startLon, endLon){
  startLat <- degrees.to.radians(startLat);
  startLon <- degrees.to.radians(startLon);
  endLat <- degrees.to.radians(endLat);
  endLon <- degrees.to.radians(endLon);
  dLon <- endLon - startLon;

  dPhi <- log(tan(endLat / 2 + pi / 4) / tan(startLat / 2 + pi / 4));
  if (abs(dLon) > pi) {
    if (dLon > 0) {
      dLon <- -(2 * pi - dLon);
    } else {
      dLon <- (2 * pi + dLon);
    }
  }
  bearing <- radians.to.degrees((atan2(dLon, dPhi) + 360 )) %% 360;
  return (bearing);
}


Anything more you need?



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/SparkR-API-problem-with-subsetting-distributed-data-frame-tp27688p27691.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Re: SparkR API problem with subsetting distributed data frame

Posted by Felix Cheung <fe...@hotmail.com>.
Could you include code snippets you are running?





On Sat, Sep 10, 2016 at 1:44 AM -0700, "Bene" <be...@outlook.com>> wrote:

Hi,

I am having a problem with the SparkR API. I need to subset a distributed
data so I can extract single values from it on which I can then do
calculations.

Each row of my df has two integer values, I am creating a vector of new
values calculated as a series of sin, cos, tan functions on these two
values. Does anyone have an idea how to do this in SparkR?

So far I tried subsetting with [], [[]], subset(), but mostly I get the
error

object of type 'S4' is not subsettable

Is there any way to do such a thing in SparkR? Any help would be greatly
appreciated! Also let me know if you need more information, code etc.
Thanks!



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/SparkR-API-problem-with-subsetting-distributed-data-frame-tp27688.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org