You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Judy Nash <ju...@exchange.microsoft.com> on 2014/11/21 04:06:59 UTC

beeline via spark thrift doesn't retain cache

Hi friends,

I have successfully setup thrift server and execute beeline on top.

Beeline can handle select queries just fine, but it cannot seem to do any kind of caching/RDD operations.

i.e.

1)      Command "cache table" doesn't work. See error:

Error: Error while processing statement: FAILED: ParseException line 1:0 cannot

recognize input near 'cache' 'table' 'hivesampletable' (state=42000,code=40000)



2)      Re-run SQL commands do not have any performance improvements.

By comparison, Spark-SQL shell can execute "cache table" command and rerunning SQL command has a huge performance boost.

Am I missing something or this is expected when execute through Spark thrift server?

Thanks!
Judy

RE: beeline via spark thrift doesn't retain cache

Posted by Judy Nash <ju...@exchange.microsoft.com>.

Thanks Yanbo.
My issue was 1) . I had spark thrift server setup, but it was running against hive instead of Spark SQL due a local change.

After I fix this, beeline automatically caches rerun queries + accepts cache table.

From: Yanbo Liang [mailto:yanbohappy@gmail.com]
Sent: Friday, November 21, 2014 12:42 AM
To: Judy Nash
Cc: user@spark.incubator.apache.org
Subject: Re: beeline via spark thrift doesn't retain cache

1) make sure your beeline client connected to Hiveserver2 of Spark SQL.
You can found execution logs of Hiveserver2 in the environment of start-thriftserver.sh.
2) what about your scale of data. If cache with small data, it will take more time to schedule workload between different executors.
Look the configuration of spark execution environment. Whether there are enough memory for RDD storage, if not, it will take some time to serialize/deserialize data between memory and disk.

2014-11-21 11:06 GMT+08:00 Judy Nash <ju...@exchange.microsoft.com>>:
Hi friends,

I have successfully setup thrift server and execute beeline on top.

Beeline can handle select queries just fine, but it cannot seem to do any kind of caching/RDD operations.

i.e.

1)      Command “cache table” doesn’t work. See error:

Error: Error while processing statement: FAILED: ParseException line 1:0 cannot

recognize input near 'cache' 'table' 'hivesampletable' (state=42000,code=40000)

2)      Re-run SQL commands do not have any performance improvements.

By comparison, Spark-SQL shell can execute “cache table” command and rerunning SQL command has a huge performance boost.

Am I missing something or this is expected when execute through Spark thrift server?

Thanks!
Judy

Re: beeline via spark thrift doesn't retain cache

Posted by Yanbo Liang <ya...@gmail.com>.

1) make sure your beeline client connected to Hiveserver2 of Spark SQL.
You can found execution logs of Hiveserver2 in the environment
of start-thriftserver.sh.
2) what about your scale of data. If cache with small data, it will take
more time to schedule workload between different executors.
Look the configuration of spark execution environment. Whether there are
enough memory for RDD storage, if not, it will take some time to
serialize/deserialize data between memory and disk.

2014-11-21 11:06 GMT+08:00 Judy Nash <ju...@exchange.microsoft.com>:

>  Hi friends,
>
>
>
> I have successfully setup thrift server and execute beeline on top.
>
>
>
> Beeline can handle select queries just fine, but it cannot seem to do any
> kind of caching/RDD operations.
>
>
>
> i.e.
>
> 1)      Command “cache table” doesn’t work. See error:
>
> Error: Error while processing statement: FAILED: ParseException line 1:0
> cannot
>
> recognize input near 'cache' 'table' 'hivesampletable'
> (state=42000,code=40000)
>
>
>
> 2)      Re-run SQL commands do not have any performance improvements.
>
>
>
> By comparison, Spark-SQL shell can execute “cache table” command and
> rerunning SQL command has a huge performance boost.
>
>
>
> Am I missing something or this is expected when execute through Spark
> thrift server?
>
>
>
> Thanks!
>
> Judy
>
>
>