You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Sarath Chandra <sa...@algofusiontech.com> on 2014/09/21 11:26:31 UTC
Saving RDD with array of strings
Hi All,
If my RDD is having array/sequence of strings, how can I save them as a
HDFS file with each string on separate line?
For example if I write code as below, the output should get saved as hdfs
file having one string per line
...
...
var newLines = lines.map(line => myfunc(line));
newLines.saveAsTextFile(hdfsPath);
...
...
def myfunc(line: String):Array[String] = {
line.split(";");
}
Thanks,
~Sarath.
Re: Saving RDD with array of strings
Posted by Julien Carme <ju...@gmail.com>.
Just use flatMap, it does exactly what you need:
newLines.flatMap { lines => lines }.saveAsTextFile(...)
2014-09-21 11:26 GMT+02:00 Sarath Chandra <
sarathchandra.josyam@algofusiontech.com>:
> Hi All,
>
> If my RDD is having array/sequence of strings, how can I save them as a
> HDFS file with each string on separate line?
>
> For example if I write code as below, the output should get saved as hdfs
> file having one string per line
> ...
> ...
> var newLines = lines.map(line => myfunc(line));
> newLines.saveAsTextFile(hdfsPath);
> ...
> ...
> def myfunc(line: String):Array[String] = {
> line.split(";");
> }
>
> Thanks,
> ~Sarath.
>