You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Imran Rajjad <ra...@gmail.com> on 2017/07/19 12:49:26 UTC

Slow responce on Solr Cloud with Spark

Greetings,

We are trying out Spark 2 + ThriftServer to join multiple
collections from a Solr Cloud (6.4.x). I have followed this blog
https://lucidworks.com/2015/08/20/solr-spark-sql-datasource/

I understand that initially spark populates the temporary table with 18633014
records and takes its due time, however any following SQLs on the temporary
table take the same amount of time . It seems the temporary tables is not
being re-used or cached. The fields in the solr collection do not have the
docValue enabled, could that be the reason? Apparently I have missed a trick

regards,
Imran

-- 
I.R

Re: Slow responce on Solr Cloud with Spark

Posted by Anastasios Zouzias <zo...@gmail.com>.
Hi Imran,

It seems that you do not cache your underlying DataFrame. I would suggest
to force a cache with tweets.cache() and then tweets.count(). Let us know
if your problem persists.

Best,
Anastasios

On Wed, Jul 19, 2017 at 2:49 PM, Imran Rajjad <ra...@gmail.com> wrote:

> Greetings,
>
> We are trying out Spark 2 + ThriftServer to join multiple
> collections from a Solr Cloud (6.4.x). I have followed this blog
> https://lucidworks.com/2015/08/20/solr-spark-sql-datasource/
>
> I understand that initially spark populates the temporary table with 18633014
> records and takes its due time, however any following SQLs on the
> temporary table take the same amount of time . It seems the temporary
> tables is not being re-used or cached. The fields in the solr collection do
> not have the docValue enabled, could that be the reason? Apparently I have
> missed a trick
>
> regards,
> Imran
>
> --
> I.R
>



-- 
-- Anastasios Zouzias
<az...@zurich.ibm.com>