You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Silvina Caíno Lores <si...@gmail.com> on 2014/07/16 12:01:57 UTC

Reading file header in Spark

Hi everyone!

I'm really new to Spark and I'm trying to figure out which would be the
proper way to do the following:

1.- Read a file header (a single line)
2.- Build with it a configuration object
3.- Use that object in a function that will be called by map()

I thought about using filter() after textFile(), but I don't want to get an
RDD as result for I'm expecting a unique object.

Any help is very appreciated.

Thanks in advance,
Silvina

Re: Reading file header in Spark

Posted by Silvina Caíno Lores <si...@gmail.com>.

Thank you! This is what I needed, I've read it should work as the first()
method as well. It's a pity that the taken element cannot be removed from
the RDD though.

Thanks again!


On 16 July 2014 12:09, Sean Owen <so...@cloudera.com> wrote:

> You can rdd.take(1) to get just the header line.
>
> I think someone mentioned before that this is a good use case for
> having a "tail" method on RDDs too, to skip the header for subsequent
> processing. But you can ignore it with a filter, or logic in your map
> method.
>
> On Wed, Jul 16, 2014 at 11:01 AM, Silvina Caíno Lores
> <si...@gmail.com> wrote:
> > Hi everyone!
> >
> > I'm really new to Spark and I'm trying to figure out which would be the
> > proper way to do the following:
> >
> > 1.- Read a file header (a single line)
> > 2.- Build with it a configuration object
> > 3.- Use that object in a function that will be called by map()
> >
> > I thought about using filter() after textFile(), but I don't want to get
> an
> > RDD as result for I'm expecting a unique object.
> >
> > Any help is very appreciated.
> >
> > Thanks in advance,
> > Silvina
>

Re: Reading file header in Spark

Posted by Sean Owen <so...@cloudera.com>.

You can rdd.take(1) to get just the header line.

I think someone mentioned before that this is a good use case for
having a "tail" method on RDDs too, to skip the header for subsequent
processing. But you can ignore it with a filter, or logic in your map
method.

On Wed, Jul 16, 2014 at 11:01 AM, Silvina Caíno Lores
<si...@gmail.com> wrote:
> Hi everyone!
>
> I'm really new to Spark and I'm trying to figure out which would be the
> proper way to do the following:
>
> 1.- Read a file header (a single line)
> 2.- Build with it a configuration object
> 3.- Use that object in a function that will be called by map()
>
> I thought about using filter() after textFile(), but I don't want to get an
> RDD as result for I'm expecting a unique object.
>
> Any help is very appreciated.
>
> Thanks in advance,
> Silvina