You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Jerry Raj <je...@gmail.com> on 2014/12/17 06:43:01 UTC
Spark SQL DSL for joins?
Hi,
I'm using the Scala DSL for Spark SQL, but I'm not able to do joins. I
have two tables (backed by Parquet files) and I need to do a join across
them using a common field (user_id). This works fine using standard SQL
but not using the language-integrated DSL neither
t1.join(t2, on = 't1.user_id == t2.user_id)
nor
t1.join(t2, on = Some('t1.user_id == t2.user_id))
work, or even compile. I could not find any examples of how to perform a
join using the DSL. Any pointers will be appreciated :)
Thanks
-Jerry
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org
Re: Spark SQL DSL for joins?
Posted by Jerry Raj <je...@gmail.com>.
Thanks, that helped. And I needed SchemaRDD.as() to provide an alias for
the RDD.
-Jerry
On 17/12/14 12:12 pm, Tobias Pfeiffer wrote:
> Jerry,
>
> On Wed, Dec 17, 2014 at 3:35 PM, Jerry Raj <jerry.raj@gmail.com
> <ma...@gmail.com>> wrote:
>
> Another problem with the DSL:
>
> t1.where('term == "dmin").count() returns zero.
>
>
> Looks like you need ===:
> https://spark.apache.org/docs/1.1.0/api/scala/index.html#org.apache.spark.sql.SchemaRDD
>
> Tobias
>
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org
Re: Spark SQL DSL for joins?
Posted by Tobias Pfeiffer <tg...@preferred.jp>.
Jerry,
On Wed, Dec 17, 2014 at 3:35 PM, Jerry Raj <je...@gmail.com> wrote:
>
> Another problem with the DSL:
>
> t1.where('term == "dmin").count() returns zero.
Looks like you need ===:
https://spark.apache.org/docs/1.1.0/api/scala/index.html#org.apache.spark.sql.SchemaRDD
Tobias
Re: Spark SQL DSL for joins?
Posted by Jerry Raj <je...@gmail.com>.
Another problem with the DSL:
t1.where('term == "dmin").count() returns zero. But
sqlCtx.sql("select * from t1 where term = 'dmin').count() returns 700,
which I know is correct from the data. Is there something wrong with how
I'm using the DSL?
Thanks
On 17/12/14 11:13 am, Jerry Raj wrote:
> Hi,
> I'm using the Scala DSL for Spark SQL, but I'm not able to do joins. I
> have two tables (backed by Parquet files) and I need to do a join across
> them using a common field (user_id). This works fine using standard SQL
> but not using the language-integrated DSL neither
>
> t1.join(t2, on = 't1.user_id == t2.user_id)
>
> nor
>
> t1.join(t2, on = Some('t1.user_id == t2.user_id))
>
> work, or even compile. I could not find any examples of how to perform a
> join using the DSL. Any pointers will be appreciated :)
>
> Thanks
> -Jerry
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org
Re: Spark SQL DSL for joins?
Posted by Cheng Lian <li...@gmail.com>.
On 12/17/14 1:43 PM, Jerry Raj wrote:
> Hi,
> I'm using the Scala DSL for Spark SQL, but I'm not able to do joins. I
> have two tables (backed by Parquet files) and I need to do a join
> across them using a common field (user_id). This works fine using
> standard SQL but not using the language-integrated DSL neither
>
> t1.join(t2, on = 't1.user_id == t2.user_id)
Two issues this line:
1. use |===| instead of |==|
2. Add a single quote before |t2|
>
> nor
>
> t1.join(t2, on = Some('t1.user_id == t2.user_id))
>
> work, or even compile. I could not find any examples of how to perform
> a join using the DSL. Any pointers will be appreciated :)
>
> Thanks
> -Jerry
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>