You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Victor Tso-Guillen <vt...@paxata.com> on 2014/09/12 02:00:27 UTC

Backwards RDD

Iterating an RDD gives you each partition in order of their split index.
I'd like to be able to get each partition in reverse order, but I'm having
difficultly implementing the compute() method. I thought I could do
something like this:

  override def getDependencies: Seq[Dependency[_]] = {
    Seq(new NarrowDependency[T](prev) {
      def getParents(partitionId: Int): Seq[Int] = {
        Seq(prev.partitions.size - partitionId - 1)
      }
    })
  }

  override def compute(split: Partition, context: TaskContext): Iterator[T]
= {
    firstParent[T].iterator(split, context).toArray.reverseIterator
  }

But that doesn't work. How do I get one split to depend on exactly one
split from the parent that does not match indices?

Re: Backwards RDD

Posted by Victor Tso-Guillen <vt...@paxata.com>.
I'm now making the Backwards RDD take the previous RDD's partitions and
then using those to iterate. Passes my test. Is it kosher?

On Thu, Sep 11, 2014 at 5:00 PM, Victor Tso-Guillen <vt...@paxata.com> wrote:

> Iterating an RDD gives you each partition in order of their split index.
> I'd like to be able to get each partition in reverse order, but I'm having
> difficultly implementing the compute() method. I thought I could do
> something like this:
>
>   override def getDependencies: Seq[Dependency[_]] = {
>     Seq(new NarrowDependency[T](prev) {
>       def getParents(partitionId: Int): Seq[Int] = {
>         Seq(prev.partitions.size - partitionId - 1)
>       }
>     })
>   }
>
>   override def compute(split: Partition, context: TaskContext):
> Iterator[T] = {
>     firstParent[T].iterator(split, context).toArray.reverseIterator
>   }
>
> But that doesn't work. How do I get one split to depend on exactly one
> split from the parent that does not match indices?
>