You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Deep Pradhan <pr...@gmail.com> on 2014/09/24 06:33:45 UTC
Converting one RDD to another
Hi,
Is it always possible to get one RDD from another.
For example, if I do a *top(K)(Ordering....)*, I get an Int right? (In my
example the type is Int). I do not get an RDD.
Can anyone explain this to me?
Thank You
Re: Converting one RDD to another
Posted by Sean Owen <so...@cloudera.com>.
top returns the specified number of "largest" elements in your RDD.
They are returned to the driver as an Array. If you want to make an
RDD out of them again, call SparkContext.parallelize(...). Make sure
this is what you mean though.
On Wed, Sep 24, 2014 at 5:33 AM, Deep Pradhan <pr...@gmail.com> wrote:
> Hi,
> Is it always possible to get one RDD from another.
> For example, if I do a top(K)(Ordering....), I get an Int right? (In my
> example the type is Int). I do not get an RDD.
> Can anyone explain this to me?
> Thank You
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org
Re: Converting one RDD to another
Posted by Zhan Zhang <zz...@hortonworks.com>.
Here is my understanding
def takeOrdered(num: Int)(implicit ord: Ordering[T]): Array[T] = {
if (num == 0) { //if 0, return empty array
Array.empty
} else {
mapPartitions { items => //map each partition to a a new one with the iterator consists of the single queue, which has num of elements.
// Priority keeps the largest elements, so let's reverse the ordering.
val queue = new BoundedPriorityQueue[T](num)(ord.reverse)
queue ++= util.collection.Utils.takeOrdered(items, num)(ord)
Iterator.single(queue)
}.reduce { (queue1, queue2) => //runJob is called here to collect all the element from rdd, which is actually a queue from each partition.
queue1 ++= queue2
queue1
}.toArray.sorted(ord) //to array and sort
}
}
On Sep 23, 2014, at 9:33 PM, Deep Pradhan <pr...@gmail.com> wrote:
> Hi,
> Is it always possible to get one RDD from another.
> For example, if I do a top(K)(Ordering....), I get an Int right? (In my example the type is Int). I do not get an RDD.
> Can anyone explain this to me?
> Thank You
--
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to
which it is addressed and may contain information that is confidential,
privileged and exempt from disclosure under applicable law. If the reader
of this message is not the intended recipient, you are hereby notified that
any printing, copying, dissemination, distribution, disclosure or
forwarding of this communication is strictly prohibited. If you have
received this communication in error, please contact the sender immediately
and delete it from your system. Thank You.