You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by "Afshartous, Nick" <na...@turbine.com> on 2016/05/04 20:09:08 UTC

Writing output of key-value Pair RDD

Hi,


Is there any way to write out to S3 the values of a f key-value Pair RDD ?


I'd like each value of a pair to be written to its own file where the file name corresponds to the key name.


Thanks,

--

    Nick

Re: Writing output of key-value Pair RDD

Posted by "Afshartous, Nick" <na...@turbine.com>.
Answering my own question.


I filtered out the keys from the output file by overriding


  MultipleOutputFormat.generateActualKey


to return the empty string.

--

    Nick


class RDDMultipleTextOutputFormat extends MultipleTextOutputFormat<String, String> {

    @Override
    protected String generateFileNameForKeyValue(String key, String value, String name) {
        return key;
    }

    @Override
    protected String generateActualKey(String key, String value) {
        return "";
    }

}

________________________________
From: Afshartous, Nick <na...@turbine.com>
Sent: Thursday, May 5, 2016 3:35:17 PM
To: Nicholas Chammas; user@spark.apache.org
Subject: Re: Writing output of key-value Pair RDD



Thanks, I got the example below working.  Though it writes both the keys and values to the output file.

Is there any way to write just the values ?

--

    Nick


String[] strings = { "Abcd", "Azlksd", "whhd", "wasc", "aDxa" };

sc.parallelize(Arrays.asList(strings))

        .mapToPair(pairFunction)
        .saveAsHadoopFile("s3://...", String.class, String.class, RDDMultipleTextOutputFormat.class);


________________________________
From: Nicholas Chammas <ni...@gmail.com>
Sent: Wednesday, May 4, 2016 4:21:12 PM
To: Afshartous, Nick; user@spark.apache.org
Subject: Re: Writing output of key-value Pair RDD

You're looking for this discussion: http://stackoverflow.com/q/23995040/877069

Also, a simpler alternative with DataFrames: https://github.com/apache/spark/pull/8375#issuecomment-202458325

On Wed, May 4, 2016 at 4:09 PM Afshartous, Nick <na...@turbine.com>> wrote:

Hi,


Is there any way to write out to S3 the values of a f key-value Pair RDD ?


I'd like each value of a pair to be written to its own file where the file name corresponds to the key name.


Thanks,

--

    Nick

Re: Writing output of key-value Pair RDD

Posted by "Afshartous, Nick" <na...@turbine.com>.
Thanks, I got the example below working.  Though it writes both the keys and values to the output file.

Is there any way to write just the values ?

--

    Nick


String[] strings = { "Abcd", "Azlksd", "whhd", "wasc", "aDxa" };

sc.parallelize(Arrays.asList(strings))

        .mapToPair(pairFunction)
        .saveAsHadoopFile("s3://...", String.class, String.class, RDDMultipleTextOutputFormat.class);


________________________________
From: Nicholas Chammas <ni...@gmail.com>
Sent: Wednesday, May 4, 2016 4:21:12 PM
To: Afshartous, Nick; user@spark.apache.org
Subject: Re: Writing output of key-value Pair RDD

You're looking for this discussion: http://stackoverflow.com/q/23995040/877069

Also, a simpler alternative with DataFrames: https://github.com/apache/spark/pull/8375#issuecomment-202458325

On Wed, May 4, 2016 at 4:09 PM Afshartous, Nick <na...@turbine.com>> wrote:

Hi,


Is there any way to write out to S3 the values of a f key-value Pair RDD ?


I'd like each value of a pair to be written to its own file where the file name corresponds to the key name.


Thanks,

--

    Nick

Re: Writing output of key-value Pair RDD

Posted by Nicholas Chammas <ni...@gmail.com>.
You're looking for this discussion:
http://stackoverflow.com/q/23995040/877069

Also, a simpler alternative with DataFrames:
https://github.com/apache/spark/pull/8375#issuecomment-202458325

On Wed, May 4, 2016 at 4:09 PM Afshartous, Nick <na...@turbine.com>
wrote:

> Hi,
>
>
> Is there any way to write out to S3 the values of a f key-value Pair RDD ?
>
>
> I'd like each value of a pair to be written to its own file where the file
> name corresponds to the key name.
>
>
> Thanks,
>
> --
>
>     Nick
>