You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by mengxr <gi...@git.apache.org> on 2014/03/17 01:03:16 UTC

[GitHub] spark pull request: [SPARK-1241] Add sliding to RDD

Github user mengxr commented on the pull request:

    https://github.com/apache/spark/pull/136#issuecomment-37776246
  
    @pwendell @mridulm , RDD.sliding is a public method in this PR. If we don't want users to treat it as a cheap operation, how about moving it to a separate RDDFunctions class and ask users to explicitly import it before use?
    
    @mateiz for n-grams, it doesn't hurt if we drop n-grams across boundaries, when each partition contains many words. But for numerical integration, we cannot ignore the boundaries.
    
    I put another implementation at https://github.com/mengxr/spark/blob/sliding-new/core/src/main/scala/org/apache/spark/rdd/SlidingRDD.scala
    
    How about this approach?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---