You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Jerry Raj <je...@gmail.com> on 2014/12/17 06:43:01 UTC

Spark SQL DSL for joins?

Hi,
I'm using the Scala DSL for Spark SQL, but I'm not able to do joins. I 
have two tables (backed by Parquet files) and I need to do a join across 
them using a common field (user_id). This works fine using standard SQL 
but not using the language-integrated DSL neither

t1.join(t2, on = 't1.user_id == t2.user_id)

nor

t1.join(t2, on = Some('t1.user_id == t2.user_id))

work, or even compile. I could not find any examples of how to perform a 
join using the DSL. Any pointers will be appreciated :)

Thanks
-Jerry

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: Spark SQL DSL for joins?

Posted by Jerry Raj <je...@gmail.com>.
Thanks, that helped. And I needed SchemaRDD.as() to provide an alias for 
the RDD.

-Jerry

On 17/12/14 12:12 pm, Tobias Pfeiffer wrote:
> Jerry,
>
> On Wed, Dec 17, 2014 at 3:35 PM, Jerry Raj <jerry.raj@gmail.com
> <ma...@gmail.com>> wrote:
>
>     Another problem with the DSL:
>
>     t1.where('term == "dmin").count() returns zero.
>
>
> Looks like you need ===:
> https://spark.apache.org/docs/1.1.0/api/scala/index.html#org.apache.spark.sql.SchemaRDD
>
> Tobias
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: Spark SQL DSL for joins?

Posted by Tobias Pfeiffer <tg...@preferred.jp>.
Jerry,

On Wed, Dec 17, 2014 at 3:35 PM, Jerry Raj <je...@gmail.com> wrote:
>
> Another problem with the DSL:
>
> t1.where('term == "dmin").count() returns zero.


Looks like you need ===:
https://spark.apache.org/docs/1.1.0/api/scala/index.html#org.apache.spark.sql.SchemaRDD

Tobias

Re: Spark SQL DSL for joins?

Posted by Jerry Raj <je...@gmail.com>.
Another problem with the DSL:

t1.where('term == "dmin").count() returns zero. But
sqlCtx.sql("select * from t1 where term = 'dmin').count() returns 700, 
which I know is correct from the data. Is there something wrong with how 
I'm using the DSL?

Thanks


On 17/12/14 11:13 am, Jerry Raj wrote:
> Hi,
> I'm using the Scala DSL for Spark SQL, but I'm not able to do joins. I
> have two tables (backed by Parquet files) and I need to do a join across
> them using a common field (user_id). This works fine using standard SQL
> but not using the language-integrated DSL neither
>
> t1.join(t2, on = 't1.user_id == t2.user_id)
>
> nor
>
> t1.join(t2, on = Some('t1.user_id == t2.user_id))
>
> work, or even compile. I could not find any examples of how to perform a
> join using the DSL. Any pointers will be appreciated :)
>
> Thanks
> -Jerry
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: Spark SQL DSL for joins?

Posted by Cheng Lian <li...@gmail.com>.
On 12/17/14 1:43 PM, Jerry Raj wrote:

> Hi,
> I'm using the Scala DSL for Spark SQL, but I'm not able to do joins. I 
> have two tables (backed by Parquet files) and I need to do a join 
> across them using a common field (user_id). This works fine using 
> standard SQL but not using the language-integrated DSL neither
>
> t1.join(t2, on = 't1.user_id == t2.user_id)

Two issues this line:

 1. use |===| instead of |==|
 2. Add a single quote before |t2|

>
> nor
>
> t1.join(t2, on = Some('t1.user_id == t2.user_id))
>
> work, or even compile. I could not find any examples of how to perform 
> a join using the DSL. Any pointers will be appreciated :)
>
> Thanks
> -Jerry
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>
​