You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by kant kodali <ka...@gmail.com> on 2017/12/06 18:07:13 UTC
sparkSession.sql("sql query") vs df.sqlContext().sql(this.query) ?
Hi All,
I have the following snippets of the code and I wonder what is the
difference between these two and which one should I use? I am using spark
2.2.
Dataset<Row> df = sparkSession.readStream()
.format("kafka")
.load();
df.createOrReplaceTempView("table");
df.printSchema();
*Dataset<Row> resultSet = df.sqlContext().sql(*
*"select value from table");
//sparkSession.sql(this.query);*StreamingQuery streamingQuery =
resultSet
.writeStream()
.trigger(Trigger.ProcessingTime(1000))
.format("console")
.start();
vs
Dataset<Row> df = sparkSession.readStream()
.format("kafka")
.load();
df.createOrReplaceTempView("table");
*Dataset<Row> resultSet = sparkSession.sql(*
*"select value from table");
//sparkSession.sql(this.query);*StreamingQuery streamingQuery =
resultSet
.writeStream()
.trigger(Trigger.ProcessingTime(1000))
.format("console")
.start();
Thanks!
Re: sparkSession.sql("sql query") vs df.sqlContext().sql(this.query) ?
Posted by khathiravan raj maadhaven <mk...@gmail.com>.
Hi Kant,
Based on my understanding, I think the only difference is the overhead of
the selection/creation of SqlContext for the query you have passed. As the
table / view is already available for use, sparkSession.sql('your query')
should be simple & good enough.
Following uses the session/context by default created and available:
* sparkSession.sql(**"select value from table")*
while the following would look for create one & run the query (which I
believe is extra overhead):
*df.sqlContext().sql(**"select value from table")*
Regards
Raj
On Wed, Dec 6, 2017 at 6:07 PM, kant kodali <ka...@gmail.com> wrote:
> Hi All,
>
> I have the following snippets of the code and I wonder what is the
> difference between these two and which one should I use? I am using spark
> 2.2.
>
> Dataset<Row> df = sparkSession.readStream()
> .format("kafka")
> .load();
>
> df.createOrReplaceTempView("table");
> df.printSchema();
>
> *Dataset<Row> resultSet = df.sqlContext().sql(*
> *"select value from table"); //sparkSession.sql(this.query);*StreamingQuery streamingQuery = resultSet
> .writeStream()
> .trigger(Trigger.ProcessingTime(1000))
> .format("console")
> .start();
>
>
> vs
>
>
> Dataset<Row> df = sparkSession.readStream()
> .format("kafka")
> .load();
>
> df.createOrReplaceTempView("table");
>
> *Dataset<Row> resultSet = sparkSession.sql(*
> *"select value from table"); //sparkSession.sql(this.query);*StreamingQuery streamingQuery = resultSet
> .writeStream()
> .trigger(Trigger.ProcessingTime(1000))
> .format("console")
> .start();
>
>
> Thanks!
>
>