You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by KhajaAsmath Mohammed <md...@gmail.com> on 2017/10/25 18:59:16 UTC

Structured Stream in Spark

Hi,

Could anyone provide suggestions on how to parse json data from kafka and
load it back in hive.

I have read about structured streaming but didn't find any examples. is
there any best practise on how to read it and parse it with structured
streaming for this use case?

Thanks,
Asmath

Re: Structured Stream in Spark

Posted by KhajaAsmath Mohammed <md...@gmail.com>.

Yes I checked both the output location and console too. It doesnt have any
data.

link also has the code and question that I have raised with Azure
HDInsights.

https://github.com/Azure/spark-eventhubs/issues/195


On Fri, Oct 27, 2017 at 3:22 PM, Shixiong(Ryan) Zhu <shixiong@databricks.com
> wrote:

> The codes in the link write the data into files. Did you check the output
> location?
>
> By the way, if you want to see the data on the console, you can use the
> console sink by changing this line *format("parquet").option("path",
> outputPath + "/ETL").partitionBy("creationTime").start()* to
> *format("console").start().*
>
> On Fri, Oct 27, 2017 at 8:41 AM, KhajaAsmath Mohammed <
> mdkhajaasmath@gmail.com> wrote:
>
>> Hi TathagataDas,
>>
>> I was trying to use eventhub with spark streaming. Looks like I was able
>> to make connection successfully but cannot see any data on the console. Not
>> sure if eventhub is supported or not.
>>
>> https://github.com/Azure/spark-eventhubs/blob/master/example
>> s/src/main/scala/com/microsoft/spark/sql/examples/EventHubsS
>> tructuredStreamingExample.scala
>> is the code snippet I have used to connect to eventhub
>>
>> Thanks,
>> Asmath
>>
>>
>>
>> On Thu, Oct 26, 2017 at 9:39 AM, KhajaAsmath Mohammed <
>> mdkhajaasmath@gmail.com> wrote:
>>
>>> Thanks TD.
>>>
>>> On Wed, Oct 25, 2017 at 6:42 PM, Tathagata Das <
>>> tathagata.das1565@gmail.com> wrote:
>>>
>>>> Please do not confuse old Spark Streaming (DStreams) with Structured
>>>> Streaming. Structured Streaming's offset and checkpoint management is far
>>>> more robust than DStreams.
>>>> Take a look at my talk - https://spark-summit.org/201
>>>> 7/speakers/tathagata-das/
>>>>
>>>> On Wed, Oct 25, 2017 at 9:29 PM, KhajaAsmath Mohammed <
>>>> mdkhajaasmath@gmail.com> wrote:
>>>>
>>>>> Thanks Subhash.
>>>>>
>>>>> Have you ever used zero data loss concept with streaming. I am bit
>>>>> worried to use streamig when it comes to data loss.
>>>>>
>>>>> https://blog.cloudera.com/blog/2017/06/offset-management-for
>>>>> -apache-kafka-with-apache-spark-streaming/
>>>>>
>>>>>
>>>>> does structured streaming handles it internally?
>>>>>
>>>>> On Wed, Oct 25, 2017 at 3:10 PM, Subhash Sriram <
>>>>> subhash.sriram@gmail.com> wrote:
>>>>>
>>>>>> No problem! Take a look at this:
>>>>>>
>>>>>> http://spark.apache.org/docs/latest/structured-streaming-pro
>>>>>> gramming-guide.html#recovering-from-failures-with-checkpointing
>>>>>>
>>>>>> Thanks,
>>>>>> Subhash
>>>>>>
>>>>>> On Wed, Oct 25, 2017 at 4:08 PM, KhajaAsmath Mohammed <
>>>>>> mdkhajaasmath@gmail.com> wrote:
>>>>>>
>>>>>>> Hi Sriram,
>>>>>>>
>>>>>>> Thanks. This is what I was looking for.
>>>>>>>
>>>>>>> one question, where do we need to specify the checkpoint directory
>>>>>>> in case of structured streaming?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Asmath
>>>>>>>
>>>>>>> On Wed, Oct 25, 2017 at 2:52 PM, Subhash Sriram <
>>>>>>> subhash.sriram@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi Asmath,
>>>>>>>>
>>>>>>>> Here is an example of using structured streaming to read from Kafka:
>>>>>>>>
>>>>>>>> https://github.com/apache/spark/blob/master/examples/src/mai
>>>>>>>> n/scala/org/apache/spark/examples/sql/streaming/StructuredKa
>>>>>>>> fkaWordCount.scala
>>>>>>>>
>>>>>>>> In terms of parsing the JSON, there is a from_json function that
>>>>>>>> you can use. The following might help:
>>>>>>>>
>>>>>>>> https://databricks.com/blog/2017/02/23/working-complex-data-
>>>>>>>> formats-structured-streaming-apache-spark-2-1.html
>>>>>>>>
>>>>>>>> I hope this helps.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Subhash
>>>>>>>>
>>>>>>>> On Wed, Oct 25, 2017 at 2:59 PM, KhajaAsmath Mohammed <
>>>>>>>> mdkhajaasmath@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> Could anyone provide suggestions on how to parse json data from
>>>>>>>>> kafka and load it back in hive.
>>>>>>>>>
>>>>>>>>> I have read about structured streaming but didn't find any
>>>>>>>>> examples. is there any best practise on how to read it and parse it with
>>>>>>>>> structured streaming for this use case?
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Asmath
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Structured Stream in Spark

Posted by "Shixiong(Ryan) Zhu" <sh...@databricks.com>.

The codes in the link write the data into files. Did you check the output
location?

By the way, if you want to see the data on the console, you can use the
console sink by changing this line *format("parquet").option("path",
outputPath + "/ETL").partitionBy("creationTime").start()* to
*format("console").start().*

On Fri, Oct 27, 2017 at 8:41 AM, KhajaAsmath Mohammed <
mdkhajaasmath@gmail.com> wrote:

> Hi TathagataDas,
>
> I was trying to use eventhub with spark streaming. Looks like I was able
> to make connection successfully but cannot see any data on the console. Not
> sure if eventhub is supported or not.
>
> https://github.com/Azure/spark-eventhubs/blob/master/
> examples/src/main/scala/com/microsoft/spark/sql/examples/
> EventHubsStructuredStreamingExample.scala
> is the code snippet I have used to connect to eventhub
>
> Thanks,
> Asmath
>
>
>
> On Thu, Oct 26, 2017 at 9:39 AM, KhajaAsmath Mohammed <
> mdkhajaasmath@gmail.com> wrote:
>
>> Thanks TD.
>>
>> On Wed, Oct 25, 2017 at 6:42 PM, Tathagata Das <
>> tathagata.das1565@gmail.com> wrote:
>>
>>> Please do not confuse old Spark Streaming (DStreams) with Structured
>>> Streaming. Structured Streaming's offset and checkpoint management is far
>>> more robust than DStreams.
>>> Take a look at my talk - https://spark-summit.org/201
>>> 7/speakers/tathagata-das/
>>>
>>> On Wed, Oct 25, 2017 at 9:29 PM, KhajaAsmath Mohammed <
>>> mdkhajaasmath@gmail.com> wrote:
>>>
>>>> Thanks Subhash.
>>>>
>>>> Have you ever used zero data loss concept with streaming. I am bit
>>>> worried to use streamig when it comes to data loss.
>>>>
>>>> https://blog.cloudera.com/blog/2017/06/offset-management-for
>>>> -apache-kafka-with-apache-spark-streaming/
>>>>
>>>>
>>>> does structured streaming handles it internally?
>>>>
>>>> On Wed, Oct 25, 2017 at 3:10 PM, Subhash Sriram <
>>>> subhash.sriram@gmail.com> wrote:
>>>>
>>>>> No problem! Take a look at this:
>>>>>
>>>>> http://spark.apache.org/docs/latest/structured-streaming-pro
>>>>> gramming-guide.html#recovering-from-failures-with-checkpointing
>>>>>
>>>>> Thanks,
>>>>> Subhash
>>>>>
>>>>> On Wed, Oct 25, 2017 at 4:08 PM, KhajaAsmath Mohammed <
>>>>> mdkhajaasmath@gmail.com> wrote:
>>>>>
>>>>>> Hi Sriram,
>>>>>>
>>>>>> Thanks. This is what I was looking for.
>>>>>>
>>>>>> one question, where do we need to specify the checkpoint directory in
>>>>>> case of structured streaming?
>>>>>>
>>>>>> Thanks,
>>>>>> Asmath
>>>>>>
>>>>>> On Wed, Oct 25, 2017 at 2:52 PM, Subhash Sriram <
>>>>>> subhash.sriram@gmail.com> wrote:
>>>>>>
>>>>>>> Hi Asmath,
>>>>>>>
>>>>>>> Here is an example of using structured streaming to read from Kafka:
>>>>>>>
>>>>>>> https://github.com/apache/spark/blob/master/examples/src/mai
>>>>>>> n/scala/org/apache/spark/examples/sql/streaming/StructuredKa
>>>>>>> fkaWordCount.scala
>>>>>>>
>>>>>>> In terms of parsing the JSON, there is a from_json function that you
>>>>>>> can use. The following might help:
>>>>>>>
>>>>>>> https://databricks.com/blog/2017/02/23/working-complex-data-
>>>>>>> formats-structured-streaming-apache-spark-2-1.html
>>>>>>>
>>>>>>> I hope this helps.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Subhash
>>>>>>>
>>>>>>> On Wed, Oct 25, 2017 at 2:59 PM, KhajaAsmath Mohammed <
>>>>>>> mdkhajaasmath@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> Could anyone provide suggestions on how to parse json data from
>>>>>>>> kafka and load it back in hive.
>>>>>>>>
>>>>>>>> I have read about structured streaming but didn't find any
>>>>>>>> examples. is there any best practise on how to read it and parse it with
>>>>>>>> structured streaming for this use case?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Asmath
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Structured Stream in Spark

Posted by KhajaAsmath Mohammed <md...@gmail.com>.

Hi TathagataDas,

I was trying to use eventhub with spark streaming. Looks like I was able to
make connection successfully but cannot see any data on the console. Not
sure if eventhub is supported or not.

https://github.com/Azure/spark-eventhubs/blob/master/examples/src/main/scala/com/microsoft/spark/sql/examples/EventHubsStructuredStreamingExample.scala

is the code snippet I have used to connect to eventhub

Thanks,
Asmath



On Thu, Oct 26, 2017 at 9:39 AM, KhajaAsmath Mohammed <
mdkhajaasmath@gmail.com> wrote:

> Thanks TD.
>
> On Wed, Oct 25, 2017 at 6:42 PM, Tathagata Das <
> tathagata.das1565@gmail.com> wrote:
>
>> Please do not confuse old Spark Streaming (DStreams) with Structured
>> Streaming. Structured Streaming's offset and checkpoint management is far
>> more robust than DStreams.
>> Take a look at my talk - https://spark-summit.org/201
>> 7/speakers/tathagata-das/
>>
>> On Wed, Oct 25, 2017 at 9:29 PM, KhajaAsmath Mohammed <
>> mdkhajaasmath@gmail.com> wrote:
>>
>>> Thanks Subhash.
>>>
>>> Have you ever used zero data loss concept with streaming. I am bit
>>> worried to use streamig when it comes to data loss.
>>>
>>> https://blog.cloudera.com/blog/2017/06/offset-management-for
>>> -apache-kafka-with-apache-spark-streaming/
>>>
>>>
>>> does structured streaming handles it internally?
>>>
>>> On Wed, Oct 25, 2017 at 3:10 PM, Subhash Sriram <
>>> subhash.sriram@gmail.com> wrote:
>>>
>>>> No problem! Take a look at this:
>>>>
>>>> http://spark.apache.org/docs/latest/structured-streaming-pro
>>>> gramming-guide.html#recovering-from-failures-with-checkpointing
>>>>
>>>> Thanks,
>>>> Subhash
>>>>
>>>> On Wed, Oct 25, 2017 at 4:08 PM, KhajaAsmath Mohammed <
>>>> mdkhajaasmath@gmail.com> wrote:
>>>>
>>>>> Hi Sriram,
>>>>>
>>>>> Thanks. This is what I was looking for.
>>>>>
>>>>> one question, where do we need to specify the checkpoint directory in
>>>>> case of structured streaming?
>>>>>
>>>>> Thanks,
>>>>> Asmath
>>>>>
>>>>> On Wed, Oct 25, 2017 at 2:52 PM, Subhash Sriram <
>>>>> subhash.sriram@gmail.com> wrote:
>>>>>
>>>>>> Hi Asmath,
>>>>>>
>>>>>> Here is an example of using structured streaming to read from Kafka:
>>>>>>
>>>>>> https://github.com/apache/spark/blob/master/examples/src/mai
>>>>>> n/scala/org/apache/spark/examples/sql/streaming/StructuredKa
>>>>>> fkaWordCount.scala
>>>>>>
>>>>>> In terms of parsing the JSON, there is a from_json function that you
>>>>>> can use. The following might help:
>>>>>>
>>>>>> https://databricks.com/blog/2017/02/23/working-complex-data-
>>>>>> formats-structured-streaming-apache-spark-2-1.html
>>>>>>
>>>>>> I hope this helps.
>>>>>>
>>>>>> Thanks,
>>>>>> Subhash
>>>>>>
>>>>>> On Wed, Oct 25, 2017 at 2:59 PM, KhajaAsmath Mohammed <
>>>>>> mdkhajaasmath@gmail.com> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> Could anyone provide suggestions on how to parse json data from
>>>>>>> kafka and load it back in hive.
>>>>>>>
>>>>>>> I have read about structured streaming but didn't find any examples.
>>>>>>> is there any best practise on how to read it and parse it with structured
>>>>>>> streaming for this use case?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Asmath
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Structured Stream in Spark

Posted by KhajaAsmath Mohammed <md...@gmail.com>.

Thanks TD.

On Wed, Oct 25, 2017 at 6:42 PM, Tathagata Das <ta...@gmail.com>
wrote:

> Please do not confuse old Spark Streaming (DStreams) with Structured
> Streaming. Structured Streaming's offset and checkpoint management is far
> more robust than DStreams.
> Take a look at my talk - https://spark-summit.org/
> 2017/speakers/tathagata-das/
>
> On Wed, Oct 25, 2017 at 9:29 PM, KhajaAsmath Mohammed <
> mdkhajaasmath@gmail.com> wrote:
>
>> Thanks Subhash.
>>
>> Have you ever used zero data loss concept with streaming. I am bit
>> worried to use streamig when it comes to data loss.
>>
>> https://blog.cloudera.com/blog/2017/06/offset-management-
>> for-apache-kafka-with-apache-spark-streaming/
>>
>>
>> does structured streaming handles it internally?
>>
>> On Wed, Oct 25, 2017 at 3:10 PM, Subhash Sriram <subhash.sriram@gmail.com
>> > wrote:
>>
>>> No problem! Take a look at this:
>>>
>>> http://spark.apache.org/docs/latest/structured-streaming-pro
>>> gramming-guide.html#recovering-from-failures-with-checkpointing
>>>
>>> Thanks,
>>> Subhash
>>>
>>> On Wed, Oct 25, 2017 at 4:08 PM, KhajaAsmath Mohammed <
>>> mdkhajaasmath@gmail.com> wrote:
>>>
>>>> Hi Sriram,
>>>>
>>>> Thanks. This is what I was looking for.
>>>>
>>>> one question, where do we need to specify the checkpoint directory in
>>>> case of structured streaming?
>>>>
>>>> Thanks,
>>>> Asmath
>>>>
>>>> On Wed, Oct 25, 2017 at 2:52 PM, Subhash Sriram <
>>>> subhash.sriram@gmail.com> wrote:
>>>>
>>>>> Hi Asmath,
>>>>>
>>>>> Here is an example of using structured streaming to read from Kafka:
>>>>>
>>>>> https://github.com/apache/spark/blob/master/examples/src/mai
>>>>> n/scala/org/apache/spark/examples/sql/streaming/StructuredKa
>>>>> fkaWordCount.scala
>>>>>
>>>>> In terms of parsing the JSON, there is a from_json function that you
>>>>> can use. The following might help:
>>>>>
>>>>> https://databricks.com/blog/2017/02/23/working-complex-data-
>>>>> formats-structured-streaming-apache-spark-2-1.html
>>>>>
>>>>> I hope this helps.
>>>>>
>>>>> Thanks,
>>>>> Subhash
>>>>>
>>>>> On Wed, Oct 25, 2017 at 2:59 PM, KhajaAsmath Mohammed <
>>>>> mdkhajaasmath@gmail.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Could anyone provide suggestions on how to parse json data from kafka
>>>>>> and load it back in hive.
>>>>>>
>>>>>> I have read about structured streaming but didn't find any examples.
>>>>>> is there any best practise on how to read it and parse it with structured
>>>>>> streaming for this use case?
>>>>>>
>>>>>> Thanks,
>>>>>> Asmath
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Structured Stream in Spark

Posted by Tathagata Das <ta...@gmail.com>.

Please do not confuse old Spark Streaming (DStreams) with Structured
Streaming. Structured Streaming's offset and checkpoint management is far
more robust than DStreams.
Take a look at my talk -
https://spark-summit.org/2017/speakers/tathagata-das/

On Wed, Oct 25, 2017 at 9:29 PM, KhajaAsmath Mohammed <
mdkhajaasmath@gmail.com> wrote:

> Thanks Subhash.
>
> Have you ever used zero data loss concept with streaming. I am bit worried
> to use streamig when it comes to data loss.
>
> https://blog.cloudera.com/blog/2017/06/offset-management-for-apache-kafka-
> with-apache-spark-streaming/
>
>
> does structured streaming handles it internally?
>
> On Wed, Oct 25, 2017 at 3:10 PM, Subhash Sriram <su...@gmail.com>
> wrote:
>
>> No problem! Take a look at this:
>>
>> http://spark.apache.org/docs/latest/structured-streaming-pro
>> gramming-guide.html#recovering-from-failures-with-checkpointing
>>
>> Thanks,
>> Subhash
>>
>> On Wed, Oct 25, 2017 at 4:08 PM, KhajaAsmath Mohammed <
>> mdkhajaasmath@gmail.com> wrote:
>>
>>> Hi Sriram,
>>>
>>> Thanks. This is what I was looking for.
>>>
>>> one question, where do we need to specify the checkpoint directory in
>>> case of structured streaming?
>>>
>>> Thanks,
>>> Asmath
>>>
>>> On Wed, Oct 25, 2017 at 2:52 PM, Subhash Sriram <
>>> subhash.sriram@gmail.com> wrote:
>>>
>>>> Hi Asmath,
>>>>
>>>> Here is an example of using structured streaming to read from Kafka:
>>>>
>>>> https://github.com/apache/spark/blob/master/examples/src/mai
>>>> n/scala/org/apache/spark/examples/sql/streaming/StructuredKa
>>>> fkaWordCount.scala
>>>>
>>>> In terms of parsing the JSON, there is a from_json function that you
>>>> can use. The following might help:
>>>>
>>>> https://databricks.com/blog/2017/02/23/working-complex-data-
>>>> formats-structured-streaming-apache-spark-2-1.html
>>>>
>>>> I hope this helps.
>>>>
>>>> Thanks,
>>>> Subhash
>>>>
>>>> On Wed, Oct 25, 2017 at 2:59 PM, KhajaAsmath Mohammed <
>>>> mdkhajaasmath@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> Could anyone provide suggestions on how to parse json data from kafka
>>>>> and load it back in hive.
>>>>>
>>>>> I have read about structured streaming but didn't find any examples.
>>>>> is there any best practise on how to read it and parse it with structured
>>>>> streaming for this use case?
>>>>>
>>>>> Thanks,
>>>>> Asmath
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Structured Stream in Spark

Posted by KhajaAsmath Mohammed <md...@gmail.com>.

Thanks Subhash.

Have you ever used zero data loss concept with streaming. I am bit worried
to use streamig when it comes to data loss.

https://blog.cloudera.com/blog/2017/06/offset-management-for-apache-kafka-with-apache-spark-streaming/


does structured streaming handles it internally?

On Wed, Oct 25, 2017 at 3:10 PM, Subhash Sriram <su...@gmail.com>
wrote:

> No problem! Take a look at this:
>
> http://spark.apache.org/docs/latest/structured-streaming-
> programming-guide.html#recovering-from-failures-with-checkpointing
>
> Thanks,
> Subhash
>
> On Wed, Oct 25, 2017 at 4:08 PM, KhajaAsmath Mohammed <
> mdkhajaasmath@gmail.com> wrote:
>
>> Hi Sriram,
>>
>> Thanks. This is what I was looking for.
>>
>> one question, where do we need to specify the checkpoint directory in
>> case of structured streaming?
>>
>> Thanks,
>> Asmath
>>
>> On Wed, Oct 25, 2017 at 2:52 PM, Subhash Sriram <subhash.sriram@gmail.com
>> > wrote:
>>
>>> Hi Asmath,
>>>
>>> Here is an example of using structured streaming to read from Kafka:
>>>
>>> https://github.com/apache/spark/blob/master/examples/src/mai
>>> n/scala/org/apache/spark/examples/sql/streaming/StructuredKa
>>> fkaWordCount.scala
>>>
>>> In terms of parsing the JSON, there is a from_json function that you can
>>> use. The following might help:
>>>
>>> https://databricks.com/blog/2017/02/23/working-complex-data-
>>> formats-structured-streaming-apache-spark-2-1.html
>>>
>>> I hope this helps.
>>>
>>> Thanks,
>>> Subhash
>>>
>>> On Wed, Oct 25, 2017 at 2:59 PM, KhajaAsmath Mohammed <
>>> mdkhajaasmath@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> Could anyone provide suggestions on how to parse json data from kafka
>>>> and load it back in hive.
>>>>
>>>> I have read about structured streaming but didn't find any examples. is
>>>> there any best practise on how to read it and parse it with structured
>>>> streaming for this use case?
>>>>
>>>> Thanks,
>>>> Asmath
>>>>
>>>
>>>
>>
>

Re: Structured Stream in Spark

Posted by Subhash Sriram <su...@gmail.com>.

No problem! Take a look at this:

http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#recovering-from-failures-with-checkpointing

Thanks,
Subhash

On Wed, Oct 25, 2017 at 4:08 PM, KhajaAsmath Mohammed <
mdkhajaasmath@gmail.com> wrote:

> Hi Sriram,
>
> Thanks. This is what I was looking for.
>
> one question, where do we need to specify the checkpoint directory in case
> of structured streaming?
>
> Thanks,
> Asmath
>
> On Wed, Oct 25, 2017 at 2:52 PM, Subhash Sriram <su...@gmail.com>
> wrote:
>
>> Hi Asmath,
>>
>> Here is an example of using structured streaming to read from Kafka:
>>
>> https://github.com/apache/spark/blob/master/examples/src/
>> main/scala/org/apache/spark/examples/sql/streaming/Structu
>> redKafkaWordCount.scala
>>
>> In terms of parsing the JSON, there is a from_json function that you can
>> use. The following might help:
>>
>> https://databricks.com/blog/2017/02/23/working-complex-data-
>> formats-structured-streaming-apache-spark-2-1.html
>>
>> I hope this helps.
>>
>> Thanks,
>> Subhash
>>
>> On Wed, Oct 25, 2017 at 2:59 PM, KhajaAsmath Mohammed <
>> mdkhajaasmath@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> Could anyone provide suggestions on how to parse json data from kafka
>>> and load it back in hive.
>>>
>>> I have read about structured streaming but didn't find any examples. is
>>> there any best practise on how to read it and parse it with structured
>>> streaming for this use case?
>>>
>>> Thanks,
>>> Asmath
>>>
>>
>>
>

Re: Structured Stream in Spark

Posted by KhajaAsmath Mohammed <md...@gmail.com>.

Hi Sriram,

Thanks. This is what I was looking for.

one question, where do we need to specify the checkpoint directory in case
of structured streaming?

Thanks,
Asmath

On Wed, Oct 25, 2017 at 2:52 PM, Subhash Sriram <su...@gmail.com>
wrote:

> Hi Asmath,
>
> Here is an example of using structured streaming to read from Kafka:
>
> https://github.com/apache/spark/blob/master/examples/
> src/main/scala/org/apache/spark/examples/sql/streaming/
> StructuredKafkaWordCount.scala
>
> In terms of parsing the JSON, there is a from_json function that you can
> use. The following might help:
>
> https://databricks.com/blog/2017/02/23/working-complex-
> data-formats-structured-streaming-apache-spark-2-1.html
>
> I hope this helps.
>
> Thanks,
> Subhash
>
> On Wed, Oct 25, 2017 at 2:59 PM, KhajaAsmath Mohammed <
> mdkhajaasmath@gmail.com> wrote:
>
>> Hi,
>>
>> Could anyone provide suggestions on how to parse json data from kafka and
>> load it back in hive.
>>
>> I have read about structured streaming but didn't find any examples. is
>> there any best practise on how to read it and parse it with structured
>> streaming for this use case?
>>
>> Thanks,
>> Asmath
>>
>
>

Re: Structured Stream in Spark

Posted by Subhash Sriram <su...@gmail.com>.

Hi Asmath,

Here is an example of using structured streaming to read from Kafka:

https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/sql/streaming/StructuredKafkaWordCount.scala

In terms of parsing the JSON, there is a from_json function that you can
use. The following might help:

https://databricks.com/blog/2017/02/23/working-complex-data-formats-structured-streaming-apache-spark-2-1.html

I hope this helps.

Thanks,
Subhash

On Wed, Oct 25, 2017 at 2:59 PM, KhajaAsmath Mohammed <
mdkhajaasmath@gmail.com> wrote:

> Hi,
>
> Could anyone provide suggestions on how to parse json data from kafka and
> load it back in hive.
>
> I have read about structured streaming but didn't find any examples. is
> there any best practise on how to read it and parse it with structured
> streaming for this use case?
>
> Thanks,
> Asmath
>