You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Vadim Tkachenko (JIRA)" <ji...@apache.org> on 2016/01/12 07:20:39 UTC

[jira] [Updated] (SPARK-12763) Spark gets stuck executing SSB query

     [ https://issues.apache.org/jira/browse/SPARK-12763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vadim Tkachenko updated SPARK-12763:
------------------------------------
    Attachment: Spark shell - Details for Stage 5 (Attempt 0).pdf

Details on the stalled stage

> Spark gets stuck executing SSB query
> ------------------------------------
>
>                 Key: SPARK-12763
>                 URL: https://issues.apache.org/jira/browse/SPARK-12763
>             Project: Spark
>          Issue Type: Bug
>    Affects Versions: 1.6.0
>         Environment: Standalone cluster
>            Reporter: Vadim Tkachenko
>         Attachments: Spark shell - Details for Stage 5 (Attempt 0).pdf
>
>
> I am trying to emulate SSB load. Data generated with https://github.com/Percona-Lab/ssb-dbgen
> generated size is with 1000 scale factor and converted to parquet format.
> Now there is a following script
> val pLineOrder = sqlContext.read.parquet("/mnt/i3600/spark/ssb-1000/lineorder").cache()
> val pDate = sqlContext.read.parquet("/mnt/i3600/spark/ssb-1000/date").cache()
> val pPart = sqlContext.read.parquet("/mnt/i3600/spark/ssb-1000/part").cache()
> val pSupplier = sqlContext.read.parquet("/mnt/i3600/spark/ssb-1000/supplier").cache()
> val pCustomer = sqlContext.read.parquet("/mnt/i3600/spark/ssb-1000/customer").cache()
> pLineOrder.registerTempTable("lineorder")
> pDate.registerTempTable("date")
> pPart.registerTempTable("part")
> pSupplier.registerTempTable("supplier")
> pCustomer.registerTempTable("customer")
> query 
> val sql41=sqlContext.sql("select     D_YEAR, C_NATION,    sum(LO_REVENUE - LO_SUPPLYCOST) as profit from     date, customer, supplier, part, lineorder where     LO_CUSTKEY = C_CUSTKEY    and LO_SUPPKEY = S_SUPPKEY    and LO_PARTKEY = P_PARTKEY   and LO_ORDERDATE = D_DATEKEY    and C_REGION = 'AMERICA'    and S_REGION = 'AMERICA'    and (P_MFGR = 'MFGR#1' or P_MFGR = 'MFGR#2') group by     D_YEAR, C_NATION order by     D_YEAR, C_NATION")
> and 
> sql41.show()
> get stuck, at some point there is no progress and server is fully idle, but Job is staying at the same stage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org