You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by asma zgolli <zg...@gmail.com> on 2019/03/21 17:46:28 UTC

Fwd: Cross Join

---------- Forwarded message ---------
From: asma zgolli <zg...@gmail.com>
Date: jeu. 21 mars 2019 à 18:15
Subject: Cross Join
To: <de...@spark.apache.org>


Hello ,

I need to cross my data and i'm executing a cross join on two dataframes .

C = A.crossJoin(B)
A has 50 records
B has 5 records

the result im getting with spark 2.0 is a dataframe C having 50 records.

only the first row from B was added to C.

Is that a bug in Spark?

Asma ZGOLLI

PhD student in data engineering - computer science



-- 
Asma ZGOLLI

PhD student in data engineering - computer science
Email : zgolliasma@gmail.com
email alt:  asma.zgolli@univ-grenoble-alpes.fr <zg...@gmail.com>
Tel : (+33) 07 52 95 04 45
        (+216) 50 126 797
Skype : asma_zgolli

Re: Cross Join

Posted by kathy Harayama <ka...@gmail.com>.

Hello,
I using 2.4 , it works

scala> val df_A=Seq(("1",
10.0),("2",20.0),("3",30.0),("4",40.0),("5",50.0),("6",60.0),("7",70.0),("8",80.0),("9",90.0),("10",10.0)).toDF("id","val");
df_A: org.apache.spark.sql.DataFrame = [id: string, val: double]

scala> val df_B=Seq(("11", 10.0),("12",20.0),("13",30.0)).toDF("id","val");
df_B: org.apache.spark.sql.DataFrame = [id: string, val: double]

scala> val df_C=df_A.crossJoin(df_B)
df_C: org.apache.spark.sql.DataFrame = [id: string, val: double ... 2 more
fields]

scala> df_C.show(30);
+---+----+---+----+
| id| val| id| val|
+---+----+---+----+
|  1|10.0| 11|10.0|
|  1|10.0| 12|20.0|
|  1|10.0| 13|30.0|
|  2|20.0| 11|10.0|
|  2|20.0| 12|20.0|
|  2|20.0| 13|30.0|
|  3|30.0| 11|10.0|
|  3|30.0| 12|20.0|
|  3|30.0| 13|30.0|
|  4|40.0| 11|10.0|
|  4|40.0| 12|20.0|
|  4|40.0| 13|30.0|
|  5|50.0| 11|10.0|
|  5|50.0| 12|20.0|
|  5|50.0| 13|30.0|
|  6|60.0| 11|10.0|
|  6|60.0| 12|20.0|
|  6|60.0| 13|30.0|
|  7|70.0| 11|10.0|
|  7|70.0| 12|20.0|
|  7|70.0| 13|30.0|
|  8|80.0| 11|10.0|
|  8|80.0| 12|20.0|
|  8|80.0| 13|30.0|
|  9|90.0| 11|10.0|
|  9|90.0| 12|20.0|
|  9|90.0| 13|30.0|
| 10|10.0| 11|10.0|
| 10|10.0| 12|20.0|
| 10|10.0| 13|30.0|
+---+----+---+----+

Kathleen

On Thu, Mar 21, 2019 at 10:47 AM asma zgolli <zg...@gmail.com> wrote:

>
>
> ---------- Forwarded message ---------
> From: asma zgolli <zg...@gmail.com>
> Date: jeu. 21 mars 2019 à 18:15
> Subject: Cross Join
> To: <de...@spark.apache.org>
>
>
> Hello ,
>
> I need to cross my data and i'm executing a cross join on two dataframes .
>
> C = A.crossJoin(B)
> A has 50 records
> B has 5 records
>
> the result im getting with spark 2.0 is a dataframe C having 50 records.
>
> only the first row from B was added to C.
>
> Is that a bug in Spark?
>
> Asma ZGOLLI
>
> PhD student in data engineering - computer science
>
>
>
> --
> Asma ZGOLLI
>
> PhD student in data engineering - computer science
> Email : zgolliasma@gmail.com
> email alt:  asma.zgolli@univ-grenoble-alpes.fr <zg...@gmail.com>
> Tel : (+33) 07 52 95 04 45
>         (+216) 50 126 797
> Skype : asma_zgolli
>