You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by 付雅丹 <ya...@gmail.com> on 2015/07/07 16:18:56 UTC

How to write mapreduce programming in spark by using java on user-defined javaPairRDD?

Hi, everyone!

I've got <key,value> pair in form of <LongWritable, Text>, where I used the
following code:

SparkConf conf = new SparkConf().setAppName("MapReduceFileInput");
JavaSparkContext sc = new JavaSparkContext(conf);
Configuration confHadoop = new Configuration();

JavaPairRDD<LongWritable,Text> sourceFile=sc.newAPIHadoopFile(
"hdfs://cMaster:9000/wcinput/data.txt",
DataInputFormat.class,LongWritable.class,Text.class,confHadoop);

Now I want to handle the javapairrdd data from <LongWritable, Text> to
another <LongWritable, Text>, where the Text content is different. After
that, I want to write Text into hdfs in order of LongWritable value. But I
don't know how to write mapreduce function in spark using java language.
Someone can help me?


Sincerely,
Missie.

Re: How to write mapreduce programming in spark by using java on user-defined javaPairRDD?

Posted by Feynman Liang <fl...@databricks.com>.

Hi MIssie,

In the Java API, you should consider:

   1. RDD.map
   <https://spark.apache.org/docs/latest/api/java/org/apache/spark/rdd/RDD.html#map(scala.Function1,%20scala.reflect.ClassTag)>
to
   transform the text
   2. RDD.sortBy
   <https://spark.apache.org/docs/latest/api/java/org/apache/spark/rdd/RDD.html#sortBy(scala.Function1,%20boolean,%20int,%20scala.math.Ordering,%20scala.reflect.ClassTag)>
to
   order by LongWritable
   3. RDD.saveAsTextFile
   <https://spark.apache.org/docs/latest/api/java/org/apache/spark/rdd/RDD.html#saveAsTextFile(java.lang.String)>
to
   write to HDFS

On Tue, Jul 7, 2015 at 7:18 AM, 付雅丹 <ya...@gmail.com> wrote:

> Hi, everyone!
>
> I've got <key,value> pair in form of <LongWritable, Text>, where I used
> the following code:
>
> SparkConf conf = new SparkConf().setAppName("MapReduceFileInput");
> JavaSparkContext sc = new JavaSparkContext(conf);
> Configuration confHadoop = new Configuration();
>
> JavaPairRDD<LongWritable,Text> sourceFile=sc.newAPIHadoopFile(
> "hdfs://cMaster:9000/wcinput/data.txt",
> DataInputFormat.class,LongWritable.class,Text.class,confHadoop);
>
> Now I want to handle the javapairrdd data from <LongWritable, Text> to
> another <LongWritable, Text>, where the Text content is different. After
> that, I want to write Text into hdfs in order of LongWritable value. But I
> don't know how to write mapreduce function in spark using java language.
> Someone can help me?
>
>
> Sincerely,
> Missie.
>