You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Spark Enthusiast <sp...@yahoo.in> on 2015/08/03 16:57:05 UTC
How do I Process Streams that span multiple lines?
All examples of Spark Stream programming that I see assume streams of lines that are then tokenised and acted upon (like the WordCount example).
How do I process Streams that span multiple lines? Are there examples that I can use?
Re: How do I Process Streams that span multiple lines?
Posted by Michal Čizmazia <mi...@gmail.com>.
Sorry.
SparkContext.wholeTextFiles
Not sure about streams.
On 3 August 2015 at 14:50, Michal Čizmazia <mi...@gmail.com> wrote:
> Are you looking for RDD.wholeTextFiles?
>
> On 3 August 2015 at 10:57, Spark Enthusiast <sp...@yahoo.in>
> wrote:
>
>> All examples of Spark Stream programming that I see assume streams of
>> lines that are then tokenised and acted upon (like the WordCount example).
>>
>> How do I process Streams that span multiple lines? Are there examples
>> that I can use?
>>
>
>
Re: How do I Process Streams that span multiple lines?
Posted by Michal Čizmazia <mi...@gmail.com>.
Are you looking for RDD.wholeTextFiles?
On 3 August 2015 at 10:57, Spark Enthusiast <sp...@yahoo.in>
wrote:
> All examples of Spark Stream programming that I see assume streams of
> lines that are then tokenised and acted upon (like the WordCount example).
>
> How do I process Streams that span multiple lines? Are there examples that
> I can use?
>
Re: How do I Process Streams that span multiple lines?
Posted by Akhil Das <ak...@sigmoidanalytics.com>.
If you are using Kafka, then you can basically push an entire file as a
message to Kafka. In that case in your DStream, you will receive the single
message which is the contents of the file and it can of course span
multiple lines.
Thanks
Best Regards
On Mon, Aug 3, 2015 at 8:27 PM, Spark Enthusiast <sp...@yahoo.in>
wrote:
> All examples of Spark Stream programming that I see assume streams of
> lines that are then tokenised and acted upon (like the WordCount example).
>
> How do I process Streams that span multiple lines? Are there examples that
> I can use?
>