You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "RJ Nowling (JIRA)" <ji...@apache.org> on 2014/12/03 18:24:12 UTC

[jira] [Created] (SPARK-4727) Add "dimensional" RDDs (time series, spatial)

RJ Nowling created SPARK-4727:
---------------------------------

             Summary: Add "dimensional" RDDs (time series, spatial)
                 Key: SPARK-4727
                 URL: https://issues.apache.org/jira/browse/SPARK-4727
             Project: Spark
          Issue Type: Brainstorming
          Components: Spark Core
    Affects Versions: 1.1.0
            Reporter: RJ Nowling


Certain types of data (times series, spatial) can benefit from specialized RDDs.  I'd like to open a discussion about this.

For example, time series data should be ordered by time and would benefit from operations like:
* Subsampling (taking every n data points)
* Signal processing (correlations, FFTs, filtering)
* Windowing functions

Spatial data benefits from ordering and partitioning along a 2D or 3D grid.  For example, path finding algorithms can optimized by only comparing points within a set distance, which can be computed more efficiently by partitioning data into a grid.

Although the operations on time series and spatial data may be different, there is some commonality in the sense of the data having ordered dimensions and the implementations may overlap.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org