You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Imran Rajjad <ra...@gmail.com> on 2017/11/15 13:47:24 UTC

spark strucured csv file stream not detecting new files

Greetings,
I am running a unit test designed to stream a folder where I am manually
copying csv files. The files do not always get picked up. They only get
detected when the job starts with the files already in the folder.

I even tried using the option of fileNameOnly newly included in 2.2.0. Have
I missed something in the documentation. This problem does not seem to
occur in DStreams examples


DataStreamReader reader =  spark.readStream().option("fileNameOnly",
true).option("header",true)
    .schema(userSchema);
  ;

Dataset<Row>csvDF= reader.csv(watchDir)

Dataset<Row> results = csvDF.groupBy("myCol").count();
MyForEach forEachObj=new MyForEach();
query = results
    .writeStream()
    .foreach(forEachObj) // for each never gets called
    .outputMode("complete")
    .start();

-- 
I.R