You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Deepak Sharma <de...@gmail.com> on 2019/06/09 13:08:12 UTC

Read hdfs files in spark streaming

I am using spark streaming application to read from  kafka.
The value coming from kafka message is path to hdfs file.
I am using spark 2.x , spark.read.stream.
What is the best way to read this path in spark streaming and then read the
json stored at the hdfs path , may be using spark.read.json , into a df
inside the spark streaming app.
Thanks a lot in advance

-- 
Thanks
Deepak

Re: Read hdfs files in spark streaming

Posted by nitin jain <ni...@gmail.com>.
Hi Deepak,
Please let us know - how you managed it ?

Thanks,
NJ

On Mon, Jun 10, 2019 at 4:42 PM Deepak Sharma <de...@gmail.com> wrote:

> Thanks All.
> I managed to get this working.
> Marking this thread as closed.
>
> On Mon, Jun 10, 2019 at 4:14 PM Deepak Sharma <de...@gmail.com>
> wrote:
>
>> This is the project requirement , where paths are being streamed in kafka
>> topic.
>> Seems it's not possible using spark structured streaming.
>>
>>
>> On Mon, Jun 10, 2019 at 3:59 PM Shyam P <sh...@gmail.com> wrote:
>>
>>> Hi Deepak,
>>>  Why are you getting paths from kafka topic? any specific reason to do
>>> so ?
>>>
>>> Regards,
>>> Shyam
>>>
>>> On Mon, Jun 10, 2019 at 10:44 AM Deepak Sharma <de...@gmail.com>
>>> wrote:
>>>
>>>> The context is different here.
>>>> The file path are coming as messages in kafka topic.
>>>> Spark streaming (structured) consumes form this topic.
>>>> Now it have to get the value from the message , thus the path to file.
>>>> read the json stored at the file location into another df.
>>>>
>>>> Thanks
>>>> Deepak
>>>>
>>>> On Sun, Jun 9, 2019 at 11:03 PM vaquar khan <va...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Deepak,
>>>>>
>>>>> You can use textFileStream.
>>>>>
>>>>> https://spark.apache.org/docs/2.2.0/streaming-programming-guide.html
>>>>>
>>>>> Plz start using stackoverflow to ask question to other ppl so get
>>>>> benefits of answer
>>>>>
>>>>>
>>>>> Regards,
>>>>> Vaquar khan
>>>>>
>>>>> On Sun, Jun 9, 2019, 8:08 AM Deepak Sharma <de...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> I am using spark streaming application to read from  kafka.
>>>>>> The value coming from kafka message is path to hdfs file.
>>>>>> I am using spark 2.x , spark.read.stream.
>>>>>> What is the best way to read this path in spark streaming and then
>>>>>> read the json stored at the hdfs path , may be using spark.read.json , into
>>>>>> a df inside the spark streaming app.
>>>>>> Thanks a lot in advance
>>>>>>
>>>>>> --
>>>>>> Thanks
>>>>>> Deepak
>>>>>>
>>>>>
>>>>
>>>> --
>>>> Thanks
>>>> Deepak
>>>> www.bigdatabig.com
>>>> www.keosha.net
>>>>
>>>
>>
>> --
>> Thanks
>> Deepak
>> www.bigdatabig.com
>> www.keosha.net
>>
>
>
> --
> Thanks
> Deepak
> www.bigdatabig.com
> www.keosha.net
>

Re: Read hdfs files in spark streaming

Posted by Deepak Sharma <de...@gmail.com>.
Thanks All.
I managed to get this working.
Marking this thread as closed.

On Mon, Jun 10, 2019 at 4:14 PM Deepak Sharma <de...@gmail.com> wrote:

> This is the project requirement , where paths are being streamed in kafka
> topic.
> Seems it's not possible using spark structured streaming.
>
>
> On Mon, Jun 10, 2019 at 3:59 PM Shyam P <sh...@gmail.com> wrote:
>
>> Hi Deepak,
>>  Why are you getting paths from kafka topic? any specific reason to do so
>> ?
>>
>> Regards,
>> Shyam
>>
>> On Mon, Jun 10, 2019 at 10:44 AM Deepak Sharma <de...@gmail.com>
>> wrote:
>>
>>> The context is different here.
>>> The file path are coming as messages in kafka topic.
>>> Spark streaming (structured) consumes form this topic.
>>> Now it have to get the value from the message , thus the path to file.
>>> read the json stored at the file location into another df.
>>>
>>> Thanks
>>> Deepak
>>>
>>> On Sun, Jun 9, 2019 at 11:03 PM vaquar khan <va...@gmail.com>
>>> wrote:
>>>
>>>> Hi Deepak,
>>>>
>>>> You can use textFileStream.
>>>>
>>>> https://spark.apache.org/docs/2.2.0/streaming-programming-guide.html
>>>>
>>>> Plz start using stackoverflow to ask question to other ppl so get
>>>> benefits of answer
>>>>
>>>>
>>>> Regards,
>>>> Vaquar khan
>>>>
>>>> On Sun, Jun 9, 2019, 8:08 AM Deepak Sharma <de...@gmail.com>
>>>> wrote:
>>>>
>>>>> I am using spark streaming application to read from  kafka.
>>>>> The value coming from kafka message is path to hdfs file.
>>>>> I am using spark 2.x , spark.read.stream.
>>>>> What is the best way to read this path in spark streaming and then
>>>>> read the json stored at the hdfs path , may be using spark.read.json , into
>>>>> a df inside the spark streaming app.
>>>>> Thanks a lot in advance
>>>>>
>>>>> --
>>>>> Thanks
>>>>> Deepak
>>>>>
>>>>
>>>
>>> --
>>> Thanks
>>> Deepak
>>> www.bigdatabig.com
>>> www.keosha.net
>>>
>>
>
> --
> Thanks
> Deepak
> www.bigdatabig.com
> www.keosha.net
>


-- 
Thanks
Deepak
www.bigdatabig.com
www.keosha.net

Re: Read hdfs files in spark streaming

Posted by Deepak Sharma <de...@gmail.com>.
This is the project requirement , where paths are being streamed in kafka
topic.
Seems it's not possible using spark structured streaming.


On Mon, Jun 10, 2019 at 3:59 PM Shyam P <sh...@gmail.com> wrote:

> Hi Deepak,
>  Why are you getting paths from kafka topic? any specific reason to do so ?
>
> Regards,
> Shyam
>
> On Mon, Jun 10, 2019 at 10:44 AM Deepak Sharma <de...@gmail.com>
> wrote:
>
>> The context is different here.
>> The file path are coming as messages in kafka topic.
>> Spark streaming (structured) consumes form this topic.
>> Now it have to get the value from the message , thus the path to file.
>> read the json stored at the file location into another df.
>>
>> Thanks
>> Deepak
>>
>> On Sun, Jun 9, 2019 at 11:03 PM vaquar khan <va...@gmail.com>
>> wrote:
>>
>>> Hi Deepak,
>>>
>>> You can use textFileStream.
>>>
>>> https://spark.apache.org/docs/2.2.0/streaming-programming-guide.html
>>>
>>> Plz start using stackoverflow to ask question to other ppl so get
>>> benefits of answer
>>>
>>>
>>> Regards,
>>> Vaquar khan
>>>
>>> On Sun, Jun 9, 2019, 8:08 AM Deepak Sharma <de...@gmail.com>
>>> wrote:
>>>
>>>> I am using spark streaming application to read from  kafka.
>>>> The value coming from kafka message is path to hdfs file.
>>>> I am using spark 2.x , spark.read.stream.
>>>> What is the best way to read this path in spark streaming and then read
>>>> the json stored at the hdfs path , may be using spark.read.json , into a df
>>>> inside the spark streaming app.
>>>> Thanks a lot in advance
>>>>
>>>> --
>>>> Thanks
>>>> Deepak
>>>>
>>>
>>
>> --
>> Thanks
>> Deepak
>> www.bigdatabig.com
>> www.keosha.net
>>
>

-- 
Thanks
Deepak
www.bigdatabig.com
www.keosha.net

Re: Read hdfs files in spark streaming

Posted by Shyam P <sh...@gmail.com>.
Hi Deepak,
 Why are you getting paths from kafka topic? any specific reason to do so ?

Regards,
Shyam

On Mon, Jun 10, 2019 at 10:44 AM Deepak Sharma <de...@gmail.com>
wrote:

> The context is different here.
> The file path are coming as messages in kafka topic.
> Spark streaming (structured) consumes form this topic.
> Now it have to get the value from the message , thus the path to file.
> read the json stored at the file location into another df.
>
> Thanks
> Deepak
>
> On Sun, Jun 9, 2019 at 11:03 PM vaquar khan <va...@gmail.com> wrote:
>
>> Hi Deepak,
>>
>> You can use textFileStream.
>>
>> https://spark.apache.org/docs/2.2.0/streaming-programming-guide.html
>>
>> Plz start using stackoverflow to ask question to other ppl so get
>> benefits of answer
>>
>>
>> Regards,
>> Vaquar khan
>>
>> On Sun, Jun 9, 2019, 8:08 AM Deepak Sharma <de...@gmail.com> wrote:
>>
>>> I am using spark streaming application to read from  kafka.
>>> The value coming from kafka message is path to hdfs file.
>>> I am using spark 2.x , spark.read.stream.
>>> What is the best way to read this path in spark streaming and then read
>>> the json stored at the hdfs path , may be using spark.read.json , into a df
>>> inside the spark streaming app.
>>> Thanks a lot in advance
>>>
>>> --
>>> Thanks
>>> Deepak
>>>
>>
>
> --
> Thanks
> Deepak
> www.bigdatabig.com
> www.keosha.net
>

Re: Read hdfs files in spark streaming

Posted by Deepak Sharma <de...@gmail.com>.
The context is different here.
The file path are coming as messages in kafka topic.
Spark streaming (structured) consumes form this topic.
Now it have to get the value from the message , thus the path to file.
read the json stored at the file location into another df.

Thanks
Deepak

On Sun, Jun 9, 2019 at 11:03 PM vaquar khan <va...@gmail.com> wrote:

> Hi Deepak,
>
> You can use textFileStream.
>
> https://spark.apache.org/docs/2.2.0/streaming-programming-guide.html
>
> Plz start using stackoverflow to ask question to other ppl so get benefits
> of answer
>
>
> Regards,
> Vaquar khan
>
> On Sun, Jun 9, 2019, 8:08 AM Deepak Sharma <de...@gmail.com> wrote:
>
>> I am using spark streaming application to read from  kafka.
>> The value coming from kafka message is path to hdfs file.
>> I am using spark 2.x , spark.read.stream.
>> What is the best way to read this path in spark streaming and then read
>> the json stored at the hdfs path , may be using spark.read.json , into a df
>> inside the spark streaming app.
>> Thanks a lot in advance
>>
>> --
>> Thanks
>> Deepak
>>
>

-- 
Thanks
Deepak
www.bigdatabig.com
www.keosha.net

Re: Read hdfs files in spark streaming

Posted by vaquar khan <va...@gmail.com>.
Hi Deepak,

You can use textFileStream.

https://spark.apache.org/docs/2.2.0/streaming-programming-guide.html

Plz start using stackoverflow to ask question to other ppl so get benefits
of answer


Regards,
Vaquar khan

On Sun, Jun 9, 2019, 8:08 AM Deepak Sharma <de...@gmail.com> wrote:

> I am using spark streaming application to read from  kafka.
> The value coming from kafka message is path to hdfs file.
> I am using spark 2.x , spark.read.stream.
> What is the best way to read this path in spark streaming and then read
> the json stored at the hdfs path , may be using spark.read.json , into a df
> inside the spark streaming app.
> Thanks a lot in advance
>
> --
> Thanks
> Deepak
>

Read hdfs files in spark streaming

Posted by Deepak Sharma <de...@gmail.com>.
I am using spark streaming application to read from  kafka.
The value coming from kafka message is path to hdfs file.
I am using spark 2.x , spark.read.stream.
What is the best way to read this path in spark streaming and then read the
json stored at the hdfs path , may be using spark.read.json , into a df
inside the spark streaming app.
Thanks a lot in advance

-- 
Thanks
Deepak