You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Junfeng Chen <da...@gmail.com> on 2018/03/27 07:37:09 UTC

Queries with streaming sources must be executed with writeStream.start();;

I am reading some data from kafka, and willing to save them to parquet on
hdfs with structured streaming.
The data from kafka is in JSON format. I try to convert them to
DataSet<Row> with spark.read.json(). However, I get the exception:
>
> Queries with streaming sources must be executed with
> writeStream.start()

Here is my code:
>
> Dataset<Row> df = spark.readStream().format("kafka")...
> Dataset<String> jsonDataset = df.selectExpr("CAST(value AS STRING)").map...
> Dataset<Row> rowDataset = spark.read().json(jsonDataset);
>
> rowDataset.writeStream().outputMode(OutputMode.Append()).partitionBy("appname").format("parquet").option("path",savePath).start().awaitTermination();



How to solve it?

Thanks!

Regard,
Junfeng Chen