You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Spark Enthusiast <sp...@yahoo.in> on 2015/08/03 16:57:05 UTC

How do I Process Streams that span multiple lines?

All  examples of Spark Stream programming that I see assume streams of lines that are then tokenised and acted upon (like the WordCount example).
How do I process Streams that span multiple lines? Are there examples that I can use? 

Re: How do I Process Streams that span multiple lines?

Posted by Michal Čizmazia <mi...@gmail.com>.
Sorry.

SparkContext.wholeTextFiles

Not sure about streams.

On 3 August 2015 at 14:50, Michal Čizmazia <mi...@gmail.com> wrote:

> Are you looking for RDD.wholeTextFiles?
>
> On 3 August 2015 at 10:57, Spark Enthusiast <sp...@yahoo.in>
> wrote:
>
>> All  examples of Spark Stream programming that I see assume streams of
>> lines that are then tokenised and acted upon (like the WordCount example).
>>
>> How do I process Streams that span multiple lines? Are there examples
>> that I can use?
>>
>
>

Re: How do I Process Streams that span multiple lines?

Posted by Michal Čizmazia <mi...@gmail.com>.
Are you looking for RDD.wholeTextFiles?

On 3 August 2015 at 10:57, Spark Enthusiast <sp...@yahoo.in>
wrote:

> All  examples of Spark Stream programming that I see assume streams of
> lines that are then tokenised and acted upon (like the WordCount example).
>
> How do I process Streams that span multiple lines? Are there examples that
> I can use?
>

Re: How do I Process Streams that span multiple lines?

Posted by Akhil Das <ak...@sigmoidanalytics.com>.
If you are using Kafka, then you can basically push an entire file as a
message to Kafka. In that case in your DStream, you will receive the single
message which is the contents of the file and it can of course span
multiple lines.

Thanks
Best Regards

On Mon, Aug 3, 2015 at 8:27 PM, Spark Enthusiast <sp...@yahoo.in>
wrote:

> All  examples of Spark Stream programming that I see assume streams of
> lines that are then tokenised and acted upon (like the WordCount example).
>
> How do I process Streams that span multiple lines? Are there examples that
> I can use?
>