You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by jane thorpe <ja...@aol.com.INVALID> on 2020/03/31 20:31:47 UTC
HDFS file
hi,
Are there setup instructions on the website for spark-3.0.0-preview2-bin-hadoop2.7I can run same program for hdfs format
val textFile = sc.textFile("hdfs://...")
val counts = textFile.flatMap(line => line.split(" "))
.map(word => (word, 1))
.reduceByKey(_ + _)
counts.saveAsTextFile("hdfs://...")
val textFile = sc.textFile("/data/README.md")val counts = textFile.flatMap(line => line.split(" ")) .map(word => (word, 1)) .reduceByKey(_ + _)counts.saveAsTextFile("/data/wordcount")
textFile: org.apache.spark.rdd.RDD[String] = /data/README.md MapPartitionsRDD[23] at textFile at <console>:28counts: org.apache.spark.rdd.RDD[(String, Int)] = ShuffledRDD[26] at reduceByKey at <console>:31
br
Jane
Re: HDFS file hdfs://127.0.0.1:9000/hdfs/spark/examples/README.txt
Posted by jane thorpe <ja...@aol.com.INVALID>.
Thanks darling
I tried this and worked
hdfs getconf -confKey fs.defaultFS
hdfs://localhost:9000
scala> :paste
// Entering paste mode (ctrl-D to finish)
val textFile = sc.textFile("hdfs://127.0.0.1:9000/hdfs/spark/examples/README.txt")
val counts = textFile.flatMap(line => line.split(" "))
.map(word => (word, 1))
.reduceByKey(_ + _)
counts.saveAsTextFile("hdfs://127.0.0.1:9000/hdfs/spark/examples/README7.out")
// Exiting paste mode, now interpreting.
textFile: org.apache.spark.rdd.RDD[String] = hdfs://127.0.0.1:9000/hdfs/spark/examples/README.txt MapPartitionsRDD[91] at textFile at <pastie>:27
counts: org.apache.spark.rdd.RDD[(String, Int)] = ShuffledRDD[94] at reduceByKey at <pastie>:30
scala> :quit
jane thorpe
janethorpe1@aol.com
-----Original Message-----
From: Som Lima <so...@gmail.com>
CC: user <us...@spark.apache.org>
Sent: Tue, 31 Mar 2020 23:06
Subject: Re: HDFS file
Hi Jane
Try this example
https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/streaming/HdfsWordCount.scala
Som
On Tue, 31 Mar 2020, 21:34 jane thorpe, <ja...@aol.com.invalid> wrote:
hi,
Are there setup instructions on the website for spark-3.0.0-preview2-bin-hadoop2.7I can run same program for hdfs format
val textFile = sc.textFile("hdfs://...")
val counts = textFile.flatMap(line => line.split(" "))
.map(word => (word, 1))
.reduceByKey(_ + _)
counts.saveAsTextFile("hdfs://...")
val textFile = sc.textFile("/data/README.md")val counts = textFile.flatMap(line => line.split(" ")) .map(word => (word, 1)) .reduceByKey(_ + _)counts.saveAsTextFile("/data/wordcount")
textFile: org.apache.spark.rdd.RDD[String] = /data/README.md MapPartitionsRDD[23] at textFile at <console>:28counts: org.apache.spark.rdd.RDD[(String, Int)] = ShuffledRDD[26] at reduceByKey at <console>:31
br
Jane
Re: HDFS file
Posted by Som Lima <so...@gmail.com>.
Hi Jane
Try this example
https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/streaming/HdfsWordCount.scala
Som
On Tue, 31 Mar 2020, 21:34 jane thorpe, <ja...@aol.com.invalid> wrote:
> hi,
>
> Are there setup instructions on the website for
> spark-3.0.0-preview2-bin-hadoop2.7
> I can run same program for hdfs format
>
> val textFile = sc.textFile("hdfs://...")val counts = textFile.flatMap(line => line.split(" "))
> .map(word => (word, 1))
> .reduceByKey(_ + _)counts.saveAsTextFile("hdfs://...")
>
>
>
> val textFile = sc.textFile("/data/README.md")
> val counts = textFile.flatMap(line => line.split(" "))
> .map(word => (word, 1))
> .reduceByKey(_ + _)
> counts.saveAsTextFile("/data/wordcount")
>
> textFile: org.apache.spark.rdd.RDD[String] = /data/README.md
> MapPartitionsRDD[23] at textFile at <console>:28
>
> counts: org.apache.spark.rdd.RDD[(String, Int)] = ShuffledRDD[26] at reduceByKey at <console>:31
>
> br
> Jane
>