You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Mohamed Nadjib Mami <ma...@iai.uni-bonn.de> on 2016/11/14 10:03:37 UTC
SparkSQL: intra-SparkSQL-application table registration
Hello,
I've asked the following question [1] on Stackoverflow but didn't get an
answer, yet. I use now this channel to give it more visibility, and
hopefully find someone who can help.
"*Context.* I have tens of SQL queries stored in separate files. For
benchmarking purposes, I created an application that iterates through
each of those query files and passes it to a standalone Spark
application. This latter /first/ parses the query, extracts the used
tables, registers them (using: registerTempTable() in Spark < 2 and
createOrReplaceTempView() in Spark 2), and executes effectively the
query (spark.sql()).
*Challenge.* Since registering the tables can sometimes be time
consuming, I would like to register the tables only once when they are
first used, and keep that in form of metadata that can readily be used
in the subsequent queries without the need to re-register the tables
again. It's a sort of intra-job caching but not any of the caching Spark
offers (table caching), as far as I know.
Is that possible? if not can anyone suggest another approach to
accomplish the same goal (i.e., iterating through separate query files
and run a querying Spark application without registering the tables that
have already been registered before)."
[1]:
http://stackoverflow.com/questions/40549924/sparksql-intra-sparksql-application-table-registration
Cheers,
Mohamed