You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2019/05/21 04:21:28 UTC
[jira] [Updated] (SPARK-12763) Spark gets stuck executing SSB query
[ https://issues.apache.org/jira/browse/SPARK-12763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon updated SPARK-12763:
---------------------------------
Labels: bulk-closed (was: )
> Spark gets stuck executing SSB query
> ------------------------------------
>
> Key: SPARK-12763
> URL: https://issues.apache.org/jira/browse/SPARK-12763
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 1.6.0
> Environment: Standalone cluster
> Reporter: Vadim Tkachenko
> Priority: Major
> Labels: bulk-closed
> Attachments: Spark shell - Details for Stage 5 (Attempt 0).pdf
>
>
> I am trying to emulate SSB load. Data generated with https://github.com/Percona-Lab/ssb-dbgen
> generated size is with 1000 scale factor and converted to parquet format.
> Now there is a following script
> val pLineOrder = sqlContext.read.parquet("/mnt/i3600/spark/ssb-1000/lineorder").cache()
> val pDate = sqlContext.read.parquet("/mnt/i3600/spark/ssb-1000/date").cache()
> val pPart = sqlContext.read.parquet("/mnt/i3600/spark/ssb-1000/part").cache()
> val pSupplier = sqlContext.read.parquet("/mnt/i3600/spark/ssb-1000/supplier").cache()
> val pCustomer = sqlContext.read.parquet("/mnt/i3600/spark/ssb-1000/customer").cache()
> pLineOrder.registerTempTable("lineorder")
> pDate.registerTempTable("date")
> pPart.registerTempTable("part")
> pSupplier.registerTempTable("supplier")
> pCustomer.registerTempTable("customer")
> query
> val sql41=sqlContext.sql("select D_YEAR, C_NATION, sum(LO_REVENUE - LO_SUPPLYCOST) as profit from date, customer, supplier, part, lineorder where LO_CUSTKEY = C_CUSTKEY and LO_SUPPKEY = S_SUPPKEY and LO_PARTKEY = P_PARTKEY and LO_ORDERDATE = D_DATEKEY and C_REGION = 'AMERICA' and S_REGION = 'AMERICA' and (P_MFGR = 'MFGR#1' or P_MFGR = 'MFGR#2') group by D_YEAR, C_NATION order by D_YEAR, C_NATION")
> and
> sql41.show()
> get stuck, at some point there is no progress and server is fully idle, but Job is staying at the same stage.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org