You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Daniel Haviv <da...@veracity-group.com> on 2016/06/06 17:43:09 UTC

groupByKey returns an emptyRDD

Hi,
I'm wrapped the following code into a jar:

val test = sc.parallelize(Seq(("daniel", "a"), ("daniel", "b"), ("test", "1)")))

val agg = test.groupByKey()
agg.collect.foreach(r=>{println(r._1)})


The result of groupByKey is an empty RDD, when I'm trying the same
code using the spark-shell it's running as expected.


Any ideas?


Thank you,

Daniel

Re: groupByKey returns an emptyRDD

Posted by Ted Yu <yu...@gmail.com>.

Can you give us a bit more information ?

how you packaged the code into jar
command you used for execution
version of Spark
related log snippet

Thanks

On Mon, Jun 6, 2016 at 10:43 AM, Daniel Haviv <
daniel.haviv@veracity-group.com> wrote:

> Hi,
> I'm wrapped the following code into a jar:
>
> val test = sc.parallelize(Seq(("daniel", "a"), ("daniel", "b"), ("test", "1)")))
>
> val agg = test.groupByKey()
> agg.collect.foreach(r=>{println(r._1)})
>
>
> The result of groupByKey is an empty RDD, when I'm trying the same code using the spark-shell it's running as expected.
>
>
> Any ideas?
>
>
> Thank you,
>
> Daniel
>
>