You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Zsombor Egyed <eg...@starschema.net> on 2015/10/13 20:31:49 UTC
Machine learning with spark (book code example error)
Hi!
I was reading the ML with spark book, and I was very interested about the
9. chapter (text mining), so I tried code examples.
Everything was fine, but in this line:
val testLabels = testRDD.map {
case (file, text) => val topic = file.split("/").takeRight(2).head
newsgroupsMap(topic) }
I got an error: "value newsgroupsMap is not a member of String"
Other relevant part of the code:
val path = "/PATH/20news-bydate-train/*"
val rdd = sc.wholeTextFiles(path)
val newsgroups = rdd.map { case (file, text) =>
file.split("/").takeRight(2).head }
val tf = hashingTF.transform(tokens)
val idf = new IDF().fit(tf)
val tfidf = idf.transform(tf)
val newsgroupsMap = newsgroups.distinct.collect().zipWithIndex.toMap
val zipped = newsgroups.zip(tfidf)
val train = zipped.map { case (topic, vector)
=>LabeledPoint(newsgroupsMap(topic), vector) }
train.cache
val model = NaiveBayes.train(train, lambda = 0.1)
val testPath = "/PATH//20news-bydate-test/*"
val testRDD = sc.wholeTextFiles(testPath)
val testLabels = testRDD.map { case (file, text) => val topic =
file.split("/").takeRight(2).head newsgroupsMap(topic) }
I attached the whole program code.
Can anyone help, what the problem is?
Regards,
Zsombor
Re: Machine learning with spark (book code example error)
Posted by Fengdong Yu <fe...@everstring.com>.
Don’t recommend this code style, you’d better brace the function block.
val testLabels = testRDD.map { case (file, text) => {
val topic = file.split("/").takeRight(2).head
newsgroupsMap(topic)
} }
> On Oct 14, 2015, at 15:46, Nick Pentreath <ni...@gmail.com> wrote:
>
> Hi there. I'm the author of the book (thanks for buying it by the way :)
>
> Ideally if you're having any trouble with the book or code, it's best to contact the publisher and submit a query (https://www.packtpub.com/books/content/support/17400 <https://www.packtpub.com/books/content/support/17400>)
>
> However, I can help with this issue. The problem is that the "testLabels" code needs to be indented over multiple lines:
>
> val testPath = "/PATH/20news-bydate-test/*"
> val testRDD = sc.wholeTextFiles(testPath)
> val testLabels = testRDD.map { case (file, text) =>
> val topic = file.split("/").takeRight(2).head
> newsgroupsMap(topic)
> }
>
> As it is in the sample code attached. If you copy the whole indented block (or line by line) into the console, it should work - I've tested all the sample code again and indeed it works for me.
>
> Hope this helps
> Nick
>
> On Tue, Oct 13, 2015 at 8:31 PM, Zsombor Egyed <egyedzs@starschema.net <ma...@starschema.net>> wrote:
> Hi!
>
> I was reading the ML with spark book, and I was very interested about the 9. chapter (text mining), so I tried code examples.
>
> Everything was fine, but in this line:
> val testLabels = testRDD.map {
> case (file, text) => val topic = file.split("/").takeRight(2).head
> newsgroupsMap(topic) }
> I got an error: "value newsgroupsMap is not a member of String"
>
> Other relevant part of the code:
> val path = "/PATH/20news-bydate-train/*"
> val rdd = sc.wholeTextFiles(path)
> val newsgroups = rdd.map { case (file, text) => file.split("/").takeRight(2).head }
>
> val tf = hashingTF.transform(tokens)
> val idf = new IDF().fit(tf)
> val tfidf = idf.transform(tf)
>
> val newsgroupsMap = newsgroups.distinct.collect().zipWithIndex.toMap
> val zipped = newsgroups.zip(tfidf)
> val train = zipped.map { case (topic, vector) =>LabeledPoint(newsgroupsMap(topic), vector) }
> train.cache
>
> val model = NaiveBayes.train(train, lambda = 0.1)
>
> val testPath = "/PATH//20news-bydate-test/*"
> val testRDD = sc.wholeTextFiles(testPath)
> val testLabels = testRDD.map { case (file, text) => val topic = file.split("/").takeRight(2).head newsgroupsMap(topic) }
>
> I attached the whole program code.
> Can anyone help, what the problem is?
>
> Regards,
> Zsombor
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org <ma...@spark.apache.org>
> For additional commands, e-mail: user-help@spark.apache.org <ma...@spark.apache.org>
>
Re: Machine learning with spark (book code example error)
Posted by Nick Pentreath <ni...@gmail.com>.
Hi there. I'm the author of the book (thanks for buying it by the way :)
Ideally if you're having any trouble with the book or code, it's best to
contact the publisher and submit a query (
https://www.packtpub.com/books/content/support/17400)
However, I can help with this issue. The problem is that the "testLabels"
code needs to be indented over multiple lines:
val testPath = "/PATH/20news-bydate-test/*"
val testRDD = sc.wholeTextFiles(testPath)
val testLabels = testRDD.map { case (file, text) =>
val topic = file.split("/").takeRight(2).head
newsgroupsMap(topic)
}
As it is in the sample code attached. If you copy the whole indented block
(or line by line) into the console, it should work - I've tested all the
sample code again and indeed it works for me.
Hope this helps
Nick
On Tue, Oct 13, 2015 at 8:31 PM, Zsombor Egyed <eg...@starschema.net>
wrote:
> Hi!
>
> I was reading the ML with spark book, and I was very interested about the
> 9. chapter (text mining), so I tried code examples.
>
> Everything was fine, but in this line:
>
> val testLabels = testRDD.map {
>
> case (file, text) => val topic = file.split("/").takeRight(2).head
>
> newsgroupsMap(topic) }
>
> I got an error: "value newsgroupsMap is not a member of String"
>
> Other relevant part of the code:
> val path = "/PATH/20news-bydate-train/*"
> val rdd = sc.wholeTextFiles(path)
> val newsgroups = rdd.map { case (file, text) =>
> file.split("/").takeRight(2).head }
>
> val tf = hashingTF.transform(tokens)
> val idf = new IDF().fit(tf)
> val tfidf = idf.transform(tf)
>
> val newsgroupsMap = newsgroups.distinct.collect().zipWithIndex.toMap
> val zipped = newsgroups.zip(tfidf)
> val train = zipped.map { case (topic, vector)
> =>LabeledPoint(newsgroupsMap(topic), vector) }
> train.cache
>
> val model = NaiveBayes.train(train, lambda = 0.1)
>
> val testPath = "/PATH//20news-bydate-test/*"
> val testRDD = sc.wholeTextFiles(testPath)
> val testLabels = testRDD.map { case (file, text) => val topic =
> file.split("/").takeRight(2).head newsgroupsMap(topic) }
>
> I attached the whole program code.
> Can anyone help, what the problem is?
>
> Regards,
> Zsombor
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>