You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by AJT <at...@currenex.com> on 2016/10/06 13:40:34 UTC

Spark SQL query

From what I have read on Spark SQL - you need to already have a dataframe
which you can then query on - e.g. select * from myDataframe where
<conditions>
Where the dataframe is either a Hive table or Avro file etc.

What if you want to create a dataframe from your underlying data on the fly
with input parameters passed into your job. 
i.e. 
1. Read my data files (e.g. avro) into a dataframe dependent on what
arguments are passed (e.g. date range)
2. perform map / mapPartitions / filter / GroupBy functions on the dataframe
to create a new dataframe
3. output this dataframe

I can see how to do this in a standard spark application (e.g. run via
spark-submit) but what if I want to use one of the myriad of tools
(Tableau/Qlik etc) that are SparkSQL compliant and run my job from there? Is
there a way I can do:

select * from
functions_on_dataframe_which_output_dataframe(dataframe_built_from_input_arguments)

Appreciate any help



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-query-tp27850.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org