You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Imran Rajjad <ra...@gmail.com> on 2017/11/15 13:47:24 UTC
spark strucured csv file stream not detecting new files
Greetings,
I am running a unit test designed to stream a folder where I am manually
copying csv files. The files do not always get picked up. They only get
detected when the job starts with the files already in the folder.
I even tried using the option of fileNameOnly newly included in 2.2.0. Have
I missed something in the documentation. This problem does not seem to
occur in DStreams examples
DataStreamReader reader = spark.readStream().option("fileNameOnly",
true).option("header",true)
.schema(userSchema);
;
Dataset<Row>csvDF= reader.csv(watchDir)
Dataset<Row> results = csvDF.groupBy("myCol").count();
MyForEach forEachObj=new MyForEach();
query = results
.writeStream()
.foreach(forEachObj) // for each never gets called
.outputMode("complete")
.start();
--
I.R