You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Irene Markelic <ir...@markelic.de> on 2022/05/05 17:30:25 UTC
groupby question
Hi everybody,
I have and rdd that I want to group according to some key, but it just
doesn't work. I am a Scala beginner. So I have the following RDD:
langs: List[String]
rdd: RDD[WikipediaArticle])
val meinVal = rdd.flatMap(article=>langs.map(lang=>{if
(article.mentionsLanguage(lang){ Tuple2(lang,article)}
else{None}})).filter(_!=None)
meinVal.collect.foreach(println) gives:
(Scala,WikipediaArticle(2,Scala and Java run on the JVM))
(Java,WikipediaArticle(2,Scala and Java run on the JVM))
(Scala,WikipediaArticle(3,Scala is not purely functional))
I have two questions:
1) Why can I not apply the groupByKey function? It is an rdd that
contains a list of tuples, the first tuple-entry is the key.
2) I don't see how to apply groupby either. I thought I could do
meinVal.groupby(x=> x._1), but that trows an error.
I notice, that when I use an IDE and hover over "meinVal" it shows that
it is RDD[Object] whereas it should be RDD[(String,WikipediaArticle)]. I
do not know how to get this information without the IDE. So it seems
that the rdd contains just one big object. I only don't see why that is.
Anyone? Please?
Irene
---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org
Re: groupby question
Posted by wilson <wi...@4shield.net>.
don't know what you were trying to express.
it's better if you can give the sample dataset and the purpose you want
to achieve, then we may give the right solution.
Thanks
Irene Markelic wrote:
> I have and rdd that I want to group according to some key, but it just
> doesn't work. I am a Scala beginner. So I have the following RDD:
---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org