You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Deepak Sharma <de...@gmail.com> on 2019/06/09 13:08:12 UTC
Read hdfs files in spark streaming
I am using spark streaming application to read from kafka.
The value coming from kafka message is path to hdfs file.
I am using spark 2.x , spark.read.stream.
What is the best way to read this path in spark streaming and then read the
json stored at the hdfs path , may be using spark.read.json , into a df
inside the spark streaming app.
Thanks a lot in advance
--
Thanks
Deepak
Re: Read hdfs files in spark streaming
Posted by nitin jain <ni...@gmail.com>.
Hi Deepak,
Please let us know - how you managed it ?
Thanks,
NJ
On Mon, Jun 10, 2019 at 4:42 PM Deepak Sharma <de...@gmail.com> wrote:
> Thanks All.
> I managed to get this working.
> Marking this thread as closed.
>
> On Mon, Jun 10, 2019 at 4:14 PM Deepak Sharma <de...@gmail.com>
> wrote:
>
>> This is the project requirement , where paths are being streamed in kafka
>> topic.
>> Seems it's not possible using spark structured streaming.
>>
>>
>> On Mon, Jun 10, 2019 at 3:59 PM Shyam P <sh...@gmail.com> wrote:
>>
>>> Hi Deepak,
>>> Why are you getting paths from kafka topic? any specific reason to do
>>> so ?
>>>
>>> Regards,
>>> Shyam
>>>
>>> On Mon, Jun 10, 2019 at 10:44 AM Deepak Sharma <de...@gmail.com>
>>> wrote:
>>>
>>>> The context is different here.
>>>> The file path are coming as messages in kafka topic.
>>>> Spark streaming (structured) consumes form this topic.
>>>> Now it have to get the value from the message , thus the path to file.
>>>> read the json stored at the file location into another df.
>>>>
>>>> Thanks
>>>> Deepak
>>>>
>>>> On Sun, Jun 9, 2019 at 11:03 PM vaquar khan <va...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Deepak,
>>>>>
>>>>> You can use textFileStream.
>>>>>
>>>>> https://spark.apache.org/docs/2.2.0/streaming-programming-guide.html
>>>>>
>>>>> Plz start using stackoverflow to ask question to other ppl so get
>>>>> benefits of answer
>>>>>
>>>>>
>>>>> Regards,
>>>>> Vaquar khan
>>>>>
>>>>> On Sun, Jun 9, 2019, 8:08 AM Deepak Sharma <de...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> I am using spark streaming application to read from kafka.
>>>>>> The value coming from kafka message is path to hdfs file.
>>>>>> I am using spark 2.x , spark.read.stream.
>>>>>> What is the best way to read this path in spark streaming and then
>>>>>> read the json stored at the hdfs path , may be using spark.read.json , into
>>>>>> a df inside the spark streaming app.
>>>>>> Thanks a lot in advance
>>>>>>
>>>>>> --
>>>>>> Thanks
>>>>>> Deepak
>>>>>>
>>>>>
>>>>
>>>> --
>>>> Thanks
>>>> Deepak
>>>> www.bigdatabig.com
>>>> www.keosha.net
>>>>
>>>
>>
>> --
>> Thanks
>> Deepak
>> www.bigdatabig.com
>> www.keosha.net
>>
>
>
> --
> Thanks
> Deepak
> www.bigdatabig.com
> www.keosha.net
>
Re: Read hdfs files in spark streaming
Posted by Deepak Sharma <de...@gmail.com>.
Thanks All.
I managed to get this working.
Marking this thread as closed.
On Mon, Jun 10, 2019 at 4:14 PM Deepak Sharma <de...@gmail.com> wrote:
> This is the project requirement , where paths are being streamed in kafka
> topic.
> Seems it's not possible using spark structured streaming.
>
>
> On Mon, Jun 10, 2019 at 3:59 PM Shyam P <sh...@gmail.com> wrote:
>
>> Hi Deepak,
>> Why are you getting paths from kafka topic? any specific reason to do so
>> ?
>>
>> Regards,
>> Shyam
>>
>> On Mon, Jun 10, 2019 at 10:44 AM Deepak Sharma <de...@gmail.com>
>> wrote:
>>
>>> The context is different here.
>>> The file path are coming as messages in kafka topic.
>>> Spark streaming (structured) consumes form this topic.
>>> Now it have to get the value from the message , thus the path to file.
>>> read the json stored at the file location into another df.
>>>
>>> Thanks
>>> Deepak
>>>
>>> On Sun, Jun 9, 2019 at 11:03 PM vaquar khan <va...@gmail.com>
>>> wrote:
>>>
>>>> Hi Deepak,
>>>>
>>>> You can use textFileStream.
>>>>
>>>> https://spark.apache.org/docs/2.2.0/streaming-programming-guide.html
>>>>
>>>> Plz start using stackoverflow to ask question to other ppl so get
>>>> benefits of answer
>>>>
>>>>
>>>> Regards,
>>>> Vaquar khan
>>>>
>>>> On Sun, Jun 9, 2019, 8:08 AM Deepak Sharma <de...@gmail.com>
>>>> wrote:
>>>>
>>>>> I am using spark streaming application to read from kafka.
>>>>> The value coming from kafka message is path to hdfs file.
>>>>> I am using spark 2.x , spark.read.stream.
>>>>> What is the best way to read this path in spark streaming and then
>>>>> read the json stored at the hdfs path , may be using spark.read.json , into
>>>>> a df inside the spark streaming app.
>>>>> Thanks a lot in advance
>>>>>
>>>>> --
>>>>> Thanks
>>>>> Deepak
>>>>>
>>>>
>>>
>>> --
>>> Thanks
>>> Deepak
>>> www.bigdatabig.com
>>> www.keosha.net
>>>
>>
>
> --
> Thanks
> Deepak
> www.bigdatabig.com
> www.keosha.net
>
--
Thanks
Deepak
www.bigdatabig.com
www.keosha.net
Re: Read hdfs files in spark streaming
Posted by Deepak Sharma <de...@gmail.com>.
This is the project requirement , where paths are being streamed in kafka
topic.
Seems it's not possible using spark structured streaming.
On Mon, Jun 10, 2019 at 3:59 PM Shyam P <sh...@gmail.com> wrote:
> Hi Deepak,
> Why are you getting paths from kafka topic? any specific reason to do so ?
>
> Regards,
> Shyam
>
> On Mon, Jun 10, 2019 at 10:44 AM Deepak Sharma <de...@gmail.com>
> wrote:
>
>> The context is different here.
>> The file path are coming as messages in kafka topic.
>> Spark streaming (structured) consumes form this topic.
>> Now it have to get the value from the message , thus the path to file.
>> read the json stored at the file location into another df.
>>
>> Thanks
>> Deepak
>>
>> On Sun, Jun 9, 2019 at 11:03 PM vaquar khan <va...@gmail.com>
>> wrote:
>>
>>> Hi Deepak,
>>>
>>> You can use textFileStream.
>>>
>>> https://spark.apache.org/docs/2.2.0/streaming-programming-guide.html
>>>
>>> Plz start using stackoverflow to ask question to other ppl so get
>>> benefits of answer
>>>
>>>
>>> Regards,
>>> Vaquar khan
>>>
>>> On Sun, Jun 9, 2019, 8:08 AM Deepak Sharma <de...@gmail.com>
>>> wrote:
>>>
>>>> I am using spark streaming application to read from kafka.
>>>> The value coming from kafka message is path to hdfs file.
>>>> I am using spark 2.x , spark.read.stream.
>>>> What is the best way to read this path in spark streaming and then read
>>>> the json stored at the hdfs path , may be using spark.read.json , into a df
>>>> inside the spark streaming app.
>>>> Thanks a lot in advance
>>>>
>>>> --
>>>> Thanks
>>>> Deepak
>>>>
>>>
>>
>> --
>> Thanks
>> Deepak
>> www.bigdatabig.com
>> www.keosha.net
>>
>
--
Thanks
Deepak
www.bigdatabig.com
www.keosha.net
Re: Read hdfs files in spark streaming
Posted by Shyam P <sh...@gmail.com>.
Hi Deepak,
Why are you getting paths from kafka topic? any specific reason to do so ?
Regards,
Shyam
On Mon, Jun 10, 2019 at 10:44 AM Deepak Sharma <de...@gmail.com>
wrote:
> The context is different here.
> The file path are coming as messages in kafka topic.
> Spark streaming (structured) consumes form this topic.
> Now it have to get the value from the message , thus the path to file.
> read the json stored at the file location into another df.
>
> Thanks
> Deepak
>
> On Sun, Jun 9, 2019 at 11:03 PM vaquar khan <va...@gmail.com> wrote:
>
>> Hi Deepak,
>>
>> You can use textFileStream.
>>
>> https://spark.apache.org/docs/2.2.0/streaming-programming-guide.html
>>
>> Plz start using stackoverflow to ask question to other ppl so get
>> benefits of answer
>>
>>
>> Regards,
>> Vaquar khan
>>
>> On Sun, Jun 9, 2019, 8:08 AM Deepak Sharma <de...@gmail.com> wrote:
>>
>>> I am using spark streaming application to read from kafka.
>>> The value coming from kafka message is path to hdfs file.
>>> I am using spark 2.x , spark.read.stream.
>>> What is the best way to read this path in spark streaming and then read
>>> the json stored at the hdfs path , may be using spark.read.json , into a df
>>> inside the spark streaming app.
>>> Thanks a lot in advance
>>>
>>> --
>>> Thanks
>>> Deepak
>>>
>>
>
> --
> Thanks
> Deepak
> www.bigdatabig.com
> www.keosha.net
>
Re: Read hdfs files in spark streaming
Posted by Deepak Sharma <de...@gmail.com>.
The context is different here.
The file path are coming as messages in kafka topic.
Spark streaming (structured) consumes form this topic.
Now it have to get the value from the message , thus the path to file.
read the json stored at the file location into another df.
Thanks
Deepak
On Sun, Jun 9, 2019 at 11:03 PM vaquar khan <va...@gmail.com> wrote:
> Hi Deepak,
>
> You can use textFileStream.
>
> https://spark.apache.org/docs/2.2.0/streaming-programming-guide.html
>
> Plz start using stackoverflow to ask question to other ppl so get benefits
> of answer
>
>
> Regards,
> Vaquar khan
>
> On Sun, Jun 9, 2019, 8:08 AM Deepak Sharma <de...@gmail.com> wrote:
>
>> I am using spark streaming application to read from kafka.
>> The value coming from kafka message is path to hdfs file.
>> I am using spark 2.x , spark.read.stream.
>> What is the best way to read this path in spark streaming and then read
>> the json stored at the hdfs path , may be using spark.read.json , into a df
>> inside the spark streaming app.
>> Thanks a lot in advance
>>
>> --
>> Thanks
>> Deepak
>>
>
--
Thanks
Deepak
www.bigdatabig.com
www.keosha.net
Re: Read hdfs files in spark streaming
Posted by vaquar khan <va...@gmail.com>.
Hi Deepak,
You can use textFileStream.
https://spark.apache.org/docs/2.2.0/streaming-programming-guide.html
Plz start using stackoverflow to ask question to other ppl so get benefits
of answer
Regards,
Vaquar khan
On Sun, Jun 9, 2019, 8:08 AM Deepak Sharma <de...@gmail.com> wrote:
> I am using spark streaming application to read from kafka.
> The value coming from kafka message is path to hdfs file.
> I am using spark 2.x , spark.read.stream.
> What is the best way to read this path in spark streaming and then read
> the json stored at the hdfs path , may be using spark.read.json , into a df
> inside the spark streaming app.
> Thanks a lot in advance
>
> --
> Thanks
> Deepak
>
Read hdfs files in spark streaming
Posted by Deepak Sharma <de...@gmail.com>.
I am using spark streaming application to read from kafka.
The value coming from kafka message is path to hdfs file.
I am using spark 2.x , spark.read.stream.
What is the best way to read this path in spark streaming and then read the
json stored at the hdfs path , may be using spark.read.json , into a df
inside the spark streaming app.
Thanks a lot in advance
--
Thanks
Deepak