You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Zsombor Egyed <eg...@starschema.net> on 2015/10/13 20:31:49 UTC

Machine learning with spark (book code example error)

Hi!

I was reading the ML with spark book, and I was very interested about the
9. chapter (text mining), so I tried code examples.

Everything was fine, but in this line:

val testLabels = testRDD.map {

case (file, text) => val topic = file.split("/").takeRight(2).head

newsgroupsMap(topic) }

I got an error: "value newsgroupsMap is not a member of String"

Other relevant part of the code:
val path = "/PATH/20news-bydate-train/*"
val rdd = sc.wholeTextFiles(path)
val newsgroups = rdd.map { case (file, text) =>
file.split("/").takeRight(2).head }

val tf = hashingTF.transform(tokens)
val idf = new IDF().fit(tf)
val tfidf = idf.transform(tf)

val newsgroupsMap = newsgroups.distinct.collect().zipWithIndex.toMap
val zipped = newsgroups.zip(tfidf)
val train = zipped.map { case (topic, vector)
=>LabeledPoint(newsgroupsMap(topic), vector) }
train.cache

val model = NaiveBayes.train(train, lambda = 0.1)

val testPath = "/PATH//20news-bydate-test/*"
val testRDD = sc.wholeTextFiles(testPath)
val testLabels = testRDD.map { case (file, text) => val topic =
file.split("/").takeRight(2).head newsgroupsMap(topic) }

I attached the whole program code.
Can anyone help, what the problem is?

Regards,
Zsombor

Re: Machine learning with spark (book code example error)

Posted by Fengdong Yu <fe...@everstring.com>.

Don’t recommend this code style, you’d better brace the function block.

val testLabels = testRDD.map { case (file, text) => {
  val topic = file.split("/").takeRight(2).head
 newsgroupsMap(topic)
} }


> On Oct 14, 2015, at 15:46, Nick Pentreath <ni...@gmail.com> wrote:
> 
> Hi there. I'm the author of the book (thanks for buying it by the way :)
> 
> Ideally if you're having any trouble with the book or code, it's best to contact the publisher and submit a query (https://www.packtpub.com/books/content/support/17400 <https://www.packtpub.com/books/content/support/17400>) 
> 
> However, I can help with this issue. The problem is that the "testLabels" code needs to be indented over multiple lines:
> 
> val testPath = "/PATH/20news-bydate-test/*"
> val testRDD = sc.wholeTextFiles(testPath)
> val testLabels = testRDD.map { case (file, text) => 
> 	val topic = file.split("/").takeRight(2).head
> 	newsgroupsMap(topic)
> }
> 
> As it is in the sample code attached. If you copy the whole indented block (or line by line) into the console, it should work - I've tested all the sample code again and indeed it works for me.
> 
> Hope this helps
> Nick
> 
> On Tue, Oct 13, 2015 at 8:31 PM, Zsombor Egyed <egyedzs@starschema.net <ma...@starschema.net>> wrote:
> Hi!
> 
> I was reading the ML with spark book, and I was very interested about the 9. chapter (text mining), so I tried code examples. 
> 
> Everything was fine, but in this line:
> val testLabels = testRDD.map { 
> case (file, text) => val topic = file.split("/").takeRight(2).head
> newsgroupsMap(topic) }
> I got an error: "value newsgroupsMap is not a member of String"
> 
> Other relevant part of the code:
> val path = "/PATH/20news-bydate-train/*"
> val rdd = sc.wholeTextFiles(path) 
> val newsgroups = rdd.map { case (file, text) => file.split("/").takeRight(2).head }
> 
> val tf = hashingTF.transform(tokens)
> val idf = new IDF().fit(tf)
> val tfidf = idf.transform(tf)
> 
> val newsgroupsMap = newsgroups.distinct.collect().zipWithIndex.toMap
> val zipped = newsgroups.zip(tfidf)
> val train = zipped.map { case (topic, vector) =>LabeledPoint(newsgroupsMap(topic), vector) }
> train.cache
> 
> val model = NaiveBayes.train(train, lambda = 0.1)
> 
> val testPath = "/PATH//20news-bydate-test/*"
> val testRDD = sc.wholeTextFiles(testPath)
> val testLabels = testRDD.map { case (file, text) => val topic = file.split("/").takeRight(2).head newsgroupsMap(topic) }
> 
> I attached the whole program code. 
> Can anyone help, what the problem is?
> 
> Regards,
> Zsombor
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org <ma...@spark.apache.org>
> For additional commands, e-mail: user-help@spark.apache.org <ma...@spark.apache.org>
>

Re: Machine learning with spark (book code example error)

Posted by Nick Pentreath <ni...@gmail.com>.

Hi there. I'm the author of the book (thanks for buying it by the way :)

Ideally if you're having any trouble with the book or code, it's best to
contact the publisher and submit a query (
https://www.packtpub.com/books/content/support/17400)

However, I can help with this issue. The problem is that the "testLabels"
code needs to be indented over multiple lines:

val testPath = "/PATH/20news-bydate-test/*"
val testRDD = sc.wholeTextFiles(testPath)
val testLabels = testRDD.map { case (file, text) =>
val topic = file.split("/").takeRight(2).head
newsgroupsMap(topic)
}

As it is in the sample code attached. If you copy the whole indented block
(or line by line) into the console, it should work - I've tested all the
sample code again and indeed it works for me.

Hope this helps
Nick

On Tue, Oct 13, 2015 at 8:31 PM, Zsombor Egyed <eg...@starschema.net>
wrote:

> Hi!
>
> I was reading the ML with spark book, and I was very interested about the
> 9. chapter (text mining), so I tried code examples.
>
> Everything was fine, but in this line:
>
> val testLabels = testRDD.map {
>
> case (file, text) => val topic = file.split("/").takeRight(2).head
>
> newsgroupsMap(topic) }
>
> I got an error: "value newsgroupsMap is not a member of String"
>
> Other relevant part of the code:
> val path = "/PATH/20news-bydate-train/*"
> val rdd = sc.wholeTextFiles(path)
> val newsgroups = rdd.map { case (file, text) =>
> file.split("/").takeRight(2).head }
>
> val tf = hashingTF.transform(tokens)
> val idf = new IDF().fit(tf)
> val tfidf = idf.transform(tf)
>
> val newsgroupsMap = newsgroups.distinct.collect().zipWithIndex.toMap
> val zipped = newsgroups.zip(tfidf)
> val train = zipped.map { case (topic, vector)
> =>LabeledPoint(newsgroupsMap(topic), vector) }
> train.cache
>
> val model = NaiveBayes.train(train, lambda = 0.1)
>
> val testPath = "/PATH//20news-bydate-test/*"
> val testRDD = sc.wholeTextFiles(testPath)
> val testLabels = testRDD.map { case (file, text) => val topic =
> file.split("/").takeRight(2).head newsgroupsMap(topic) }
>
> I attached the whole program code.
> Can anyone help, what the problem is?
>
> Regards,
> Zsombor
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>