You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Hao Ren <in...@gmail.com> on 2014/12/17 19:32:03 UTC

SparkSQL 1.2.1-snapshot Left Join problem

Hi,

When running SparkSQL branch 1.2.1 on EC2 standalone cluster, the following
query does not work:

create table debug as
select v1.* 
from t1 as v1 left join t2 as v2
on v1.sku = v2.sku
where v2.sku is null

Both t1 and t2 have 200 partitions.
t1 has 10k rows, and t2 has 4k rows.

this query block at:

14/12/17 15:56:54 INFO TaskSetManager: Finished task 133.0 in stage 2.0 (TID
541) in 370 ms on ip-10-79-184-49.ec2.internal (122/200)

Via WebUI, I can see there are 24 tasks running, as the cluster has 24 core.
The other tasks are succeeded. It seems that the 24 tasks are blocked and
won't end.

However, SparkSQL 1.1.0 works fine. There might be some problems with "join"
on 1.2.1

Any idea?

Hao





--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/SparkSQL-1-2-1-snapshot-Left-Join-problem-tp20748.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: SparkSQL 1.2.1-snapshot Left Join problem

Posted by Cheng Lian <li...@gmail.com>.

Could you please file a JIRA together with the Git commit you're using? 
Thanks!

On 12/18/14 2:32 AM, Hao Ren wrote:
> Hi,
>
> When running SparkSQL branch 1.2.1 on EC2 standalone cluster, the following
> query does not work:
>
> create table debug as
> select v1.*
> from t1 as v1 left join t2 as v2
> on v1.sku = v2.sku
> where v2.sku is null
>
> Both t1 and t2 have 200 partitions.
> t1 has 10k rows, and t2 has 4k rows.
>
> this query block at:
>
> 14/12/17 15:56:54 INFO TaskSetManager: Finished task 133.0 in stage 2.0 (TID
> 541) in 370 ms on ip-10-79-184-49.ec2.internal (122/200)
>
> Via WebUI, I can see there are 24 tasks running, as the cluster has 24 core.
> The other tasks are succeeded. It seems that the 24 tasks are blocked and
> won't end.
>
> However, SparkSQL 1.1.0 works fine. There might be some problems with "join"
> on 1.2.1
>
> Any idea?
>
> Hao
>
>
>
>
>
> --
> View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/SparkSQL-1-2-1-snapshot-Left-Join-problem-tp20748.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org