You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by yu <yu...@iastate.edu> on 2014/11/30 00:16:15 UTC

Generating a DStream by existing textfiles

Hello Everyone,

I am learning spark streaming and hope to find a convenient way to generate
data stream from textfiles for some simple experiments. After I've viewed
the scaladoc of spark, I found the methods 'textFileStream' and 'fileStream'
could only monitor new files coming in but not existing files. Is there any
method I could directly use in spark? For example, I have text1 in a folder,
how can I generate DStream containing the data from text1?

Thanks



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Generating-a-DStream-by-existing-textfiles-tp20030.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: Generating a DStream by existing textfiles

Posted by Akhil Das <ak...@sigmoidanalytics.com>.
If you look at the api doc
<https://spark.apache.org/docs/1.1.0/api/scala/index.html#org.apache.spark.streaming.StreamingContext>,
you can see the fileStream has a boolean parameter( newFilesOnly), setting
it false would pick up the existing files it seems.

Thanks
Best Regards

On Sun, Nov 30, 2014 at 4:46 AM, yu <yu...@iastate.edu> wrote:

> Hello Everyone,
>
> I am learning spark streaming and hope to find a convenient way to generate
> data stream from textfiles for some simple experiments. After I've viewed
> the scaladoc of spark, I found the methods 'textFileStream' and
> 'fileStream'
> could only monitor new files coming in but not existing files. Is there any
> method I could directly use in spark? For example, I have text1 in a
> folder,
> how can I generate DStream containing the data from text1?
>
> Thanks
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Generating-a-DStream-by-existing-textfiles-tp20030.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>