You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Irene Markelic <ir...@markelic.de> on 2022/05/05 17:30:25 UTC

groupby question

Hi everybody,

I have and rdd that I want to group according to some key, but it just 
doesn't work. I am a Scala beginner. So I have the following RDD:


langs: List[String]

rdd: RDD[WikipediaArticle])

val meinVal = rdd.flatMap(article=>langs.map(lang=>{if 
(article.mentionsLanguage(lang){ Tuple2(lang,article)} 
else{None}})).filter(_!=None)

meinVal.collect.foreach(println) gives:

(Scala,WikipediaArticle(2,Scala and Java run on the JVM))
(Java,WikipediaArticle(2,Scala and Java run on the JVM))
(Scala,WikipediaArticle(3,Scala is not purely functional))


I have two questions:

1) Why can I not apply the groupByKey function? It is an rdd that 
contains a list of tuples, the first tuple-entry is the key.

2) I don't see how to  apply groupby either. I thought I could do 
meinVal.groupby(x=> x._1), but that trows an error.

I notice, that when I use an IDE and hover over "meinVal" it shows that 
it is RDD[Object] whereas it should be RDD[(String,WikipediaArticle)]. I 
do not know how to get this information without the IDE. So it seems 
that the rdd contains just one big object. I only don't see why that is.

Anyone? Please?

Irene



---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Re: groupby question

Posted by wilson <wi...@4shield.net>.
don't know what you were trying to express.
it's better if you can give the sample dataset and the purpose you want 
to achieve, then we may give the right solution.

Thanks

Irene Markelic wrote:
> I have and rdd that I want to group according to some key, but it just 
> doesn't work. I am a Scala beginner. So I have the following RDD:

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org