You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Matt Cheah <mc...@palantir.com> on 2013/10/25 00:23:55 UTC

Take last k elements from RDD?

Hi everyone,

I see there is a take() function for RDDs, getting the first n elements. Is there a way to get the last n elements?

Thanks,

-Matt Cheah

Re: Take last k elements from RDD?

Posted by Matei Zaharia <ma...@gmail.com>.
Are you doing this because it's sorted somehow, or you have a file where you want the last K? For that you could probably use the lower-level API of SparkContext.runJob() to run a job on just the last partition and then return the last elements from there. I'm just curious how general this need is.

Matei

On Oct 24, 2013, at 3:23 PM, Matt Cheah <mc...@palantir.com> wrote:

> Hi everyone,
> 
> I see there is a take() function for RDDs, getting the first n elements. Is there a way to get the last n elements?
> 
> Thanks,
> 
> -Matt Cheah


Re: Take last k elements from RDD?

Posted by Mark Hamstra <ma...@clearstorydata.com>.
That's because a generic Iterator doesn't necessarily have a definite size
(cf. Iterator.hasDefiniteSize), and it is not possible to know where the
end of the Iterator is or how to take the last k elements from the
Iterator.  Indeed, a Iterator could be over an infinite stream and have no
end or last k elements.


On Thu, Oct 24, 2013 at 3:59 PM, Stoney Vintson <st...@gmail.com> wrote:

> I don't see common iterable trait methods such as takeRight() or last() in
> the spark scala api documentation.  There are sampling and sorting methods.
>  sample, sortByKey
>
> spark scala api
>
> http://spark.incubator.apache.org/docs/latest/scala-programming-guide.html#rdd-operations
>
> org.apache.spark.rdd
>
> http://spark.incubator.apache.org/docs/latest/api/core/index.html#org.apache.spark.rdd.RDD
>
> spark-0.8.0-incubating/core/src/main/scala/org/apache/spark/rdd
>
> Also, remember that in scala tail() will return all of the elements except
> for head.
>
>
>
> On Thu, Oct 24, 2013 at 3:23 PM, Matt Cheah <mc...@palantir.com> wrote:
>
>>  Hi everyone,
>>
>>  I see there is a take() function for RDDs, getting the first n
>> elements. Is there a way to get the last n elements?
>>
>>  Thanks,
>>
>>  -Matt Cheah
>>
>
>
>

Re: Take last k elements from RDD?

Posted by Stoney Vintson <st...@gmail.com>.
I don't see common iterable trait methods such as takeRight() or last() in
the spark scala api documentation.  There are sampling and sorting methods.
 sample, sortByKey

spark scala api
http://spark.incubator.apache.org/docs/latest/scala-programming-guide.html#rdd-operations

org.apache.spark.rdd
http://spark.incubator.apache.org/docs/latest/api/core/index.html#org.apache.spark.rdd.RDD

spark-0.8.0-incubating/core/src/main/scala/org/apache/spark/rdd

Also, remember that in scala tail() will return all of the elements except
for head.


On Thu, Oct 24, 2013 at 3:23 PM, Matt Cheah <mc...@palantir.com> wrote:

>  Hi everyone,
>
>  I see there is a take() function for RDDs, getting the first n elements.
> Is there a way to get the last n elements?
>
>  Thanks,
>
>  -Matt Cheah
>