You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Sarath Chandra <sa...@algofusiontech.com> on 2014/09/21 11:26:31 UTC

Saving RDD with array of strings

Hi All,

If my RDD is having array/sequence of strings, how can I save them as a
HDFS file with each string on separate line?

For example if I write code as below, the output should get saved as hdfs
file having one string per line
...
...
var newLines = lines.map(line => myfunc(line));
newLines.saveAsTextFile(hdfsPath);
...
...
def myfunc(line: String):Array[String] = {
  line.split(";");
}

Thanks,
~Sarath.

Re: Saving RDD with array of strings

Posted by Julien Carme <ju...@gmail.com>.
Just use flatMap, it does exactly what you need:

newLines.flatMap { lines => lines }.saveAsTextFile(...)


2014-09-21 11:26 GMT+02:00 Sarath Chandra <
sarathchandra.josyam@algofusiontech.com>:

> Hi All,
>
> If my RDD is having array/sequence of strings, how can I save them as a
> HDFS file with each string on separate line?
>
> For example if I write code as below, the output should get saved as hdfs
> file having one string per line
> ...
> ...
> var newLines = lines.map(line => myfunc(line));
> newLines.saveAsTextFile(hdfsPath);
> ...
> ...
> def myfunc(line: String):Array[String] = {
>   line.split(";");
> }
>
> Thanks,
> ~Sarath.
>