You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by RJ Nowling <rn...@gmail.com> on 2014/12/03 19:00:16 UTC

RDDs for "dimensional" (time series, spatial) data

Hi all,

I created a JIRA to discuss adding RDDs for "dimensional" (not sure what
else to call it) data like time series and spatial data.  Spark could be a
better time series and/or spatial "database" than existing approaches out
there.

https://issues.apache.org/jira/browse/SPARK-4727

I saw that MLlib supports some operations for time series in 1.2.0-rc1, but
I think that specialized RDDs could optimize the partitioning and
algorithms better than a regular RDD.  Or, for example, spatial data could
be partitioned into a grid.

Any feedback would be great!

Thanks,
RJ Nowling

-- 
em rnowling@gmail.com