You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Liang Lee (JIRA)" <ji...@apache.org> on 2015/05/06 08:59:59 UTC
[jira] [Updated] (SPARK-7393) How to improve Spark SQL performance?

     [ https://issues.apache.org/jira/browse/SPARK-7393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Liang Lee updated SPARK-7393:
-----------------------------
    Description: 
We want to use Spark SQL in our project ,but we found that the Spark SQL performance is not very well as we expected. The detail is as follows:
 1. We save data as parquet file on HDFS.
 2.We just select one or several rows from the parquet file using spark SQL.
 3. When the total record number is 61 million, it needs about 3 seconds to get the result, which is unacceptable long for our scenario. 
4.When the total record number is 2 million, it needs about 93 ms to get the result, whcih is still a little long for us.
 5. The query statement is like : SELECT * FROM DBA WHERE COLA=? AND COLB=? And the table is not complex, which has less 10 columns and the content for each column is less than 100 bytes.
 6. Does any one know how to improve the performance or give some other ideas?
 7. Can Spark SQL support micro-second-level response? 

> How to improve Spark SQL performance?
> -------------------------------------
>
>                 Key: SPARK-7393
>                 URL: https://issues.apache.org/jira/browse/SPARK-7393
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>            Reporter: Liang Lee
>
> We want to use Spark SQL in our project ,but we found that the Spark SQL performance is not very well as we expected. The detail is as follows:
>  1. We save data as parquet file on HDFS.
>  2.We just select one or several rows from the parquet file using spark SQL.
>  3. When the total record number is 61 million, it needs about 3 seconds to get the result, which is unacceptable long for our scenario. 
> 4.When the total record number is 2 million, it needs about 93 ms to get the result, whcih is still a little long for us.
>  5. The query statement is like : SELECT * FROM DBA WHERE COLA=? AND COLB=? And the table is not complex, which has less 10 columns and the content for each column is less than 100 bytes.
>  6. Does any one know how to improve the performance or give some other ideas?
>  7. Can Spark SQL support micro-second-level response? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org