You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Philip Ogren <ph...@oracle.com> on 2014/04/14 17:24:15 UTC

RDD.tail()

Has there been any thought to adding a tail() method to RDD?  It would 
be really handy to skip over the first item in an RDD when it contains 
header information.  Even better would be a drop(int) function that 
would allow you to skip over several lines of header information.  Our 
attempts to do something equivalent with a filter() call seem a bit 
contorted.  Any thoughts?

Thanks,
Philip

Re: RDD.tail()

Posted by Matei Zaharia <ma...@gmail.com>.

You can use mapPartitionsWithIndex and look at the partition index (0 will be the first partition) to decide whether to skip the first line.

Matei

On Apr 14, 2014, at 8:50 AM, Ethan Jewett <es...@gmail.com> wrote:

> We have similar needs but IIRC, I came to the conclusion that this would only work on ordered RDDs, and then you would still have to figure out which partition is the first one. I ended up deciding it would be best to just drop the header lines from a Scala iterator before creating an RDD based on it. Not sure if this was the "right" thing to do, but would that work for you?
> 
> Regards,
> Ethan
> 
> 
> On Mon, Apr 14, 2014 at 10:24 AM, Philip Ogren <ph...@oracle.com> wrote:
> Has there been any thought to adding a tail() method to RDD?  It would be really handy to skip over the first item in an RDD when it contains header information.  Even better would be a drop(int) function that would allow you to skip over several lines of header information.  Our attempts to do something equivalent with a filter() call seem a bit contorted.  Any thoughts?
> 
> Thanks,
> Philip
>

Re: RDD.tail()

Posted by Ethan Jewett <es...@gmail.com>.

We have similar needs but IIRC, I came to the conclusion that this would
only work on ordered RDDs, and then you would still have to figure out
which partition is the first one. I ended up deciding it would be best to
just drop the header lines from a Scala iterator before creating an RDD
based on it. Not sure if this was the "right" thing to do, but would that
work for you?

Regards,
Ethan

On Mon, Apr 14, 2014 at 10:24 AM, Philip Ogren <ph...@oracle.com>wrote:

> Has there been any thought to adding a tail() method to RDD?  It would be
> really handy to skip over the first item in an RDD when it contains header
> information.  Even better would be a drop(int) function that would allow
> you to skip over several lines of header information.  Our attempts to do
> something equivalent with a filter() call seem a bit contorted.  Any
> thoughts?
>
> Thanks,
> Philip
>