You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by fanooos <de...@gmail.com> on 2015/03/22 11:38:12 UTC

Spark sql thrift server slower than hive

We have cloudera CDH 5.3 installed on one machine.

We are trying to use spark sql thrift server to execute some analysis
queries against hive table.

Without any changes in the configurations, we run the following query on
both hive and spark sql thrift server

*select * from tableName;*

The time taken by spark is larger than the time taken by hive which is not
supposed to be the like that.

The hive table is mapped to json files stored on HDFS directory and we are
using *org.openx.data.jsonserde.JsonSerDe* for
serialization/deserialization.

Why spark takes much more time to execute the query than hive ?



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-sql-thrift-server-slower-than-hive-tp22177.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: Spark sql thrift server slower than hive

Posted by Arush Kharbanda <ar...@sigmoidanalytics.com>.

A basis change needed by spark is setting the executor memory which
defaults to 512MB by default.

On Mon, Mar 23, 2015 at 10:16 AM, Denny Lee <de...@gmail.com> wrote:

> How are you running your spark instance out of curiosity?  Via YARN or
> standalone mode?  When connecting Spark thriftserver to the Spark service,
> have you allocated enough memory and CPU when executing with spark?
>
> On Sun, Mar 22, 2015 at 3:39 AM fanooos <de...@gmail.com> wrote:
>
>> We have cloudera CDH 5.3 installed on one machine.
>>
>> We are trying to use spark sql thrift server to execute some analysis
>> queries against hive table.
>>
>> Without any changes in the configurations, we run the following query on
>> both hive and spark sql thrift server
>>
>> *select * from tableName;*
>>
>> The time taken by spark is larger than the time taken by hive which is not
>> supposed to be the like that.
>>
>> The hive table is mapped to json files stored on HDFS directory and we are
>> using *org.openx.data.jsonserde.JsonSerDe* for
>> serialization/deserialization.
>>
>> Why spark takes much more time to execute the query than hive ?
>>
>>
>>
>> --
>> View this message in context: http://apache-spark-user-list.
>> 1001560.n3.nabble.com/Spark-sql-thrift-server-slower-than-
>> hive-tp22177.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> For additional commands, e-mail: user-help@spark.apache.org
>>
>>


-- 

[image: Sigmoid Analytics] <http://htmlsig.com/www.sigmoidanalytics.com>

*Arush Kharbanda* || Technical Teamlead

arush@sigmoidanalytics.com || www.sigmoidanalytics.com

Re: Spark sql thrift server slower than hive

Posted by Denny Lee <de...@gmail.com>.

How are you running your spark instance out of curiosity?  Via YARN or
standalone mode?  When connecting Spark thriftserver to the Spark service,
have you allocated enough memory and CPU when executing with spark?

On Sun, Mar 22, 2015 at 3:39 AM fanooos <de...@gmail.com> wrote:

> We have cloudera CDH 5.3 installed on one machine.
>
> We are trying to use spark sql thrift server to execute some analysis
> queries against hive table.
>
> Without any changes in the configurations, we run the following query on
> both hive and spark sql thrift server
>
> *select * from tableName;*
>
> The time taken by spark is larger than the time taken by hive which is not
> supposed to be the like that.
>
> The hive table is mapped to json files stored on HDFS directory and we are
> using *org.openx.data.jsonserde.JsonSerDe* for
> serialization/deserialization.
>
> Why spark takes much more time to execute the query than hive ?
>
>
>
> --
> View this message in context: http://apache-spark-user-list.
> 1001560.n3.nabble.com/Spark-sql-thrift-server-slower-than-
> hive-tp22177.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>