You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Mohamed Nadjib Mami <ma...@iai.uni-bonn.de> on 2016/11/14 10:03:37 UTC

SparkSQL: intra-SparkSQL-application table registration

Hello,

I've asked the following question [1] on Stackoverflow but didn't get an 
answer, yet. I use now this channel to give it more visibility, and 
hopefully find someone who can help.

"*Context.* I have tens of SQL queries stored in separate files. For 
benchmarking purposes, I created an application that iterates through 
each of those query files and passes it to a standalone Spark 
application. This latter /first/ parses the query, extracts the used 
tables, registers them (using: registerTempTable() in Spark < 2 and 
createOrReplaceTempView() in Spark 2), and executes effectively the 
query (spark.sql()).

*Challenge.* Since registering the tables can sometimes be time 
consuming, I would like to register the tables only once when they are 
first used, and keep that in form of metadata that can readily be used 
in the subsequent queries without the need to re-register the tables 
again. It's a sort of intra-job caching but not any of the caching Spark 
offers (table caching), as far as I know.

Is that possible? if not can anyone suggest another approach to 
accomplish the same goal (i.e., iterating through separate query files 
and run a querying Spark application without registering the tables that 
have already been registered before)."

[1]: 
http://stackoverflow.com/questions/40549924/sparksql-intra-sparksql-application-table-registration

Cheers,
Mohamed