You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by "Afshartous, Nick" <na...@turbine.com> on 2016/05/04 20:09:08 UTC
Writing output of key-value Pair RDD
Hi,
Is there any way to write out to S3 the values of a f key-value Pair RDD ?
I'd like each value of a pair to be written to its own file where the file name corresponds to the key name.
Thanks,
--
Nick
Re: Writing output of key-value Pair RDD
Posted by "Afshartous, Nick" <na...@turbine.com>.
Answering my own question.
I filtered out the keys from the output file by overriding
MultipleOutputFormat.generateActualKey
to return the empty string.
--
Nick
class RDDMultipleTextOutputFormat extends MultipleTextOutputFormat<String, String> {
@Override
protected String generateFileNameForKeyValue(String key, String value, String name) {
return key;
}
@Override
protected String generateActualKey(String key, String value) {
return "";
}
}
________________________________
From: Afshartous, Nick <na...@turbine.com>
Sent: Thursday, May 5, 2016 3:35:17 PM
To: Nicholas Chammas; user@spark.apache.org
Subject: Re: Writing output of key-value Pair RDD
Thanks, I got the example below working. Though it writes both the keys and values to the output file.
Is there any way to write just the values ?
--
Nick
String[] strings = { "Abcd", "Azlksd", "whhd", "wasc", "aDxa" };
sc.parallelize(Arrays.asList(strings))
.mapToPair(pairFunction)
.saveAsHadoopFile("s3://...", String.class, String.class, RDDMultipleTextOutputFormat.class);
________________________________
From: Nicholas Chammas <ni...@gmail.com>
Sent: Wednesday, May 4, 2016 4:21:12 PM
To: Afshartous, Nick; user@spark.apache.org
Subject: Re: Writing output of key-value Pair RDD
You're looking for this discussion: http://stackoverflow.com/q/23995040/877069
Also, a simpler alternative with DataFrames: https://github.com/apache/spark/pull/8375#issuecomment-202458325
On Wed, May 4, 2016 at 4:09 PM Afshartous, Nick <na...@turbine.com>> wrote:
Hi,
Is there any way to write out to S3 the values of a f key-value Pair RDD ?
I'd like each value of a pair to be written to its own file where the file name corresponds to the key name.
Thanks,
--
Nick
Re: Writing output of key-value Pair RDD
Posted by "Afshartous, Nick" <na...@turbine.com>.
Thanks, I got the example below working. Though it writes both the keys and values to the output file.
Is there any way to write just the values ?
--
Nick
String[] strings = { "Abcd", "Azlksd", "whhd", "wasc", "aDxa" };
sc.parallelize(Arrays.asList(strings))
.mapToPair(pairFunction)
.saveAsHadoopFile("s3://...", String.class, String.class, RDDMultipleTextOutputFormat.class);
________________________________
From: Nicholas Chammas <ni...@gmail.com>
Sent: Wednesday, May 4, 2016 4:21:12 PM
To: Afshartous, Nick; user@spark.apache.org
Subject: Re: Writing output of key-value Pair RDD
You're looking for this discussion: http://stackoverflow.com/q/23995040/877069
Also, a simpler alternative with DataFrames: https://github.com/apache/spark/pull/8375#issuecomment-202458325
On Wed, May 4, 2016 at 4:09 PM Afshartous, Nick <na...@turbine.com>> wrote:
Hi,
Is there any way to write out to S3 the values of a f key-value Pair RDD ?
I'd like each value of a pair to be written to its own file where the file name corresponds to the key name.
Thanks,
--
Nick
Re: Writing output of key-value Pair RDD
Posted by Nicholas Chammas <ni...@gmail.com>.
You're looking for this discussion:
http://stackoverflow.com/q/23995040/877069
Also, a simpler alternative with DataFrames:
https://github.com/apache/spark/pull/8375#issuecomment-202458325
On Wed, May 4, 2016 at 4:09 PM Afshartous, Nick <na...@turbine.com>
wrote:
> Hi,
>
>
> Is there any way to write out to S3 the values of a f key-value Pair RDD ?
>
>
> I'd like each value of a pair to be written to its own file where the file
> name corresponds to the key name.
>
>
> Thanks,
>
> --
>
> Nick
>