You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Renyi Xiong <re...@gmail.com> on 2015/11/13 19:52:39 UTC

let spark streaming sample come to stop

Hi,

I try to run the following 1.4.1 sample by putting a words.txt under
localdir

bin\run-example org.apache.spark.examples.streaming.HdfsWordCount localdir

2 questions

1. it does not pick up words.txt because it's 'old' I guess - any option to
let it picked up?
2. I managed to put a 'new' file on the fly which got picked up, but after
processing, the program doesn't stop (keeps generating empty RDDs instead),
any option to let it stop when no new files come in (otherwise it blocks
others when I want to run multiple samples?)

Thanks,
Renyi.

Re: let spark streaming sample come to stop

Posted by Renyi Xiong <re...@gmail.com>.
I see, thanks a lot

On Mon, Nov 16, 2015 at 6:29 PM, Bryan Cutler <cu...@gmail.com> wrote:

> Hi Renyi,
>
> This is the intended behavior of the streaming HdfsWordCount example.  It
> makes use of a 'textFileStream' which will monitor a hdfs directory for any
> newly created files and push them into a dstream.  It is meant to be run
> indefinitely, unless interrupted by ctrl-c, for example.
>
> -bryan
> On Nov 13, 2015 10:52 AM, "Renyi Xiong" <re...@gmail.com> wrote:
>
>> Hi,
>>
>> I try to run the following 1.4.1 sample by putting a words.txt under
>> localdir
>>
>> bin\run-example org.apache.spark.examples.streaming.HdfsWordCount localdir
>>
>> 2 questions
>>
>> 1. it does not pick up words.txt because it's 'old' I guess - any option
>> to let it picked up?
>> 2. I managed to put a 'new' file on the fly which got picked up, but
>> after processing, the program doesn't stop (keeps generating empty RDDs
>> instead), any option to let it stop when no new files come in (otherwise it
>> blocks others when I want to run multiple samples?)
>>
>> Thanks,
>> Renyi.
>>
>

Re: let spark streaming sample come to stop

Posted by Bryan Cutler <cu...@gmail.com>.
Hi Renyi,

This is the intended behavior of the streaming HdfsWordCount example.  It
makes use of a 'textFileStream' which will monitor a hdfs directory for any
newly created files and push them into a dstream.  It is meant to be run
indefinitely, unless interrupted by ctrl-c, for example.

-bryan
On Nov 13, 2015 10:52 AM, "Renyi Xiong" <re...@gmail.com> wrote:

> Hi,
>
> I try to run the following 1.4.1 sample by putting a words.txt under
> localdir
>
> bin\run-example org.apache.spark.examples.streaming.HdfsWordCount localdir
>
> 2 questions
>
> 1. it does not pick up words.txt because it's 'old' I guess - any option
> to let it picked up?
> 2. I managed to put a 'new' file on the fly which got picked up, but after
> processing, the program doesn't stop (keeps generating empty RDDs instead),
> any option to let it stop when no new files come in (otherwise it blocks
> others when I want to run multiple samples?)
>
> Thanks,
> Renyi.
>