You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Alexander Ulanov (JIRA)" <ji...@apache.org> on 2015/05/05 03:17:06 UTC

[jira] [Updated] (SPARK-7316) Add step capability to RDD sliding window

     [ https://issues.apache.org/jira/browse/SPARK-7316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alexander Ulanov updated SPARK-7316:
------------------------------------
    Description: 
RDDFunctions in MLlib contains sliding window implementation with step 1. User should be able to define step. This capability should be implemented.

Although one can generate sliding windows with step 1 and then filter every Nth window, it might take much more time and disk space depending on the step size. For example, if your window is 1000 then you will generate the amount of data thousand times bigger than your initial dataset. It does not make sense if you need just every Nth window, so the data generated will be 1000/N smaller. 



  was:RDDFunctions in MLlib contains sliding window implementation with step 1. User should be able to define step. This capability should be implemented.


> Add step capability to RDD sliding window
> -----------------------------------------
>
>                 Key: SPARK-7316
>                 URL: https://issues.apache.org/jira/browse/SPARK-7316
>             Project: Spark
>          Issue Type: Improvement
>          Components: MLlib
>    Affects Versions: 1.3.0
>            Reporter: Alexander Ulanov
>             Fix For: 1.4.0
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> RDDFunctions in MLlib contains sliding window implementation with step 1. User should be able to define step. This capability should be implemented.
> Although one can generate sliding windows with step 1 and then filter every Nth window, it might take much more time and disk space depending on the step size. For example, if your window is 1000 then you will generate the amount of data thousand times bigger than your initial dataset. It does not make sense if you need just every Nth window, so the data generated will be 1000/N smaller. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org