You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Deep Pradhan <pr...@gmail.com> on 2014/09/24 06:33:45 UTC

Converting one RDD to another

Hi,
Is it always possible to get one RDD from another.
For example, if I do a *top(K)(Ordering....)*, I get an Int right? (In my
example the type is Int). I do not get an RDD.
Can anyone explain this to me?
Thank You

Re: Converting one RDD to another

Posted by Sean Owen <so...@cloudera.com>.
top returns the specified number of "largest" elements in your RDD.
They are returned to the driver as an Array. If you want to make an
RDD out of them again, call SparkContext.parallelize(...). Make sure
this is what you mean though.

On Wed, Sep 24, 2014 at 5:33 AM, Deep Pradhan <pr...@gmail.com> wrote:
> Hi,
> Is it always possible to get one RDD from another.
> For example, if I do a top(K)(Ordering....), I get an Int right? (In my
> example the type is Int). I do not get an RDD.
> Can anyone explain this to me?
> Thank You

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: Converting one RDD to another

Posted by Zhan Zhang <zz...@hortonworks.com>.
Here is my understanding

 def takeOrdered(num: Int)(implicit ord: Ordering[T]): Array[T] = {
    if (num == 0) { //if 0, return empty array
      Array.empty
    } else {
      mapPartitions { items =>          //map each partition to a a new one with the iterator consists of the single queue, which has num of elements.
        // Priority keeps the largest elements, so let's reverse the ordering.
        val queue = new BoundedPriorityQueue[T](num)(ord.reverse)
        queue ++= util.collection.Utils.takeOrdered(items, num)(ord)
        Iterator.single(queue)
      }.reduce { (queue1, queue2) =>  //runJob is called here to collect all the element from rdd, which is actually a queue from each partition.
        queue1 ++= queue2
        queue1
      }.toArray.sorted(ord) //to array and sort 
    }
  }



On Sep 23, 2014, at 9:33 PM, Deep Pradhan <pr...@gmail.com> wrote:

> Hi,
> Is it always possible to get one RDD from another.
> For example, if I do a top(K)(Ordering....), I get an Int right? (In my example the type is Int). I do not get an RDD.
> Can anyone explain this to me?
> Thank You


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.