You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by kant kodali <ka...@gmail.com> on 2018/01/15 22:41:59 UTC

can HDFS be a streaming source like Kafka in Spark 2.2.0?

Hi All,

I am wondering if HDFS can be a streaming source like Kafka in Spark 2.2.0?
For example can I have stream1 reading from Kafka and writing to HDFS and
stream2 to read from HDFS and write it back to Kakfa ? such that stream2
will be pulling the latest updates written by stream1.

Thanks!

Re: can HDFS be a streaming source like Kafka in Spark 2.2.0?

Posted by ayan guha <gu...@gmail.com>.

http://spark.apache.org/docs/1.0.0/streaming-programming-guide.html#input-sources


On Tue, Jan 16, 2018 at 3:50 PM, kant kodali <ka...@gmail.com> wrote:

> Got it! What about overwriting the same file instead of appending?
>
> On Mon, Jan 15, 2018 at 7:47 PM, Gourav Sengupta <
> gourav.sengupta@gmail.com> wrote:
>
>> What Gerard means is that if you are adding new files in to the same base
>> path (key) then its fine, but in case you are appending lines to the same
>> file then changes will not be picked up.
>>
>> Regards,
>> Gourav Sengupta
>>
>> On Tue, Jan 16, 2018 at 12:20 AM, kant kodali <ka...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I am not sure I understand. any examples ?
>>>
>>> On Mon, Jan 15, 2018 at 3:45 PM, Gerard Maas <ge...@gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> You can monitor a filesystem directory as streaming source as long as
>>>> the files placed there are atomically copied/moved into the directory.
>>>> Updating the files is not supported.
>>>>
>>>> kr, Gerard.
>>>>
>>>> On Mon, Jan 15, 2018 at 11:41 PM, kant kodali <ka...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi All,
>>>>>
>>>>> I am wondering if HDFS can be a streaming source like Kafka in Spark
>>>>> 2.2.0? For example can I have stream1 reading from Kafka and writing to
>>>>> HDFS and stream2 to read from HDFS and write it back to Kakfa ? such that
>>>>> stream2 will be pulling the latest updates written by stream1.
>>>>>
>>>>> Thanks!
>>>>>
>>>>
>>>>
>>>
>>
>


-- 
Best Regards,
Ayan Guha

Re: can HDFS be a streaming source like Kafka in Spark 2.2.0?

Posted by Gourav Sengupta <go...@gmail.com>.

would it not be like appending lines to the same file in that case?

On Tue, Jan 16, 2018 at 4:50 AM, kant kodali <ka...@gmail.com> wrote:

> Got it! What about overwriting the same file instead of appending?
>
> On Mon, Jan 15, 2018 at 7:47 PM, Gourav Sengupta <
> gourav.sengupta@gmail.com> wrote:
>
>> What Gerard means is that if you are adding new files in to the same base
>> path (key) then its fine, but in case you are appending lines to the same
>> file then changes will not be picked up.
>>
>> Regards,
>> Gourav Sengupta
>>
>> On Tue, Jan 16, 2018 at 12:20 AM, kant kodali <ka...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I am not sure I understand. any examples ?
>>>
>>> On Mon, Jan 15, 2018 at 3:45 PM, Gerard Maas <ge...@gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> You can monitor a filesystem directory as streaming source as long as
>>>> the files placed there are atomically copied/moved into the directory.
>>>> Updating the files is not supported.
>>>>
>>>> kr, Gerard.
>>>>
>>>> On Mon, Jan 15, 2018 at 11:41 PM, kant kodali <ka...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi All,
>>>>>
>>>>> I am wondering if HDFS can be a streaming source like Kafka in Spark
>>>>> 2.2.0? For example can I have stream1 reading from Kafka and writing to
>>>>> HDFS and stream2 to read from HDFS and write it back to Kakfa ? such that
>>>>> stream2 will be pulling the latest updates written by stream1.
>>>>>
>>>>> Thanks!
>>>>>
>>>>
>>>>
>>>
>>
>

Re: can HDFS be a streaming source like Kafka in Spark 2.2.0?

Posted by kant kodali <ka...@gmail.com>.

Got it! What about overwriting the same file instead of appending?

On Mon, Jan 15, 2018 at 7:47 PM, Gourav Sengupta <go...@gmail.com>
wrote:

> What Gerard means is that if you are adding new files in to the same base
> path (key) then its fine, but in case you are appending lines to the same
> file then changes will not be picked up.
>
> Regards,
> Gourav Sengupta
>
> On Tue, Jan 16, 2018 at 12:20 AM, kant kodali <ka...@gmail.com> wrote:
>
>> Hi,
>>
>> I am not sure I understand. any examples ?
>>
>> On Mon, Jan 15, 2018 at 3:45 PM, Gerard Maas <ge...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> You can monitor a filesystem directory as streaming source as long as
>>> the files placed there are atomically copied/moved into the directory.
>>> Updating the files is not supported.
>>>
>>> kr, Gerard.
>>>
>>> On Mon, Jan 15, 2018 at 11:41 PM, kant kodali <ka...@gmail.com>
>>> wrote:
>>>
>>>> Hi All,
>>>>
>>>> I am wondering if HDFS can be a streaming source like Kafka in Spark
>>>> 2.2.0? For example can I have stream1 reading from Kafka and writing to
>>>> HDFS and stream2 to read from HDFS and write it back to Kakfa ? such that
>>>> stream2 will be pulling the latest updates written by stream1.
>>>>
>>>> Thanks!
>>>>
>>>
>>>
>>
>

Re: can HDFS be a streaming source like Kafka in Spark 2.2.0?

Posted by Gourav Sengupta <go...@gmail.com>.

What Gerard means is that if you are adding new files in to the same base
path (key) then its fine, but in case you are appending lines to the same
file then changes will not be picked up.

Regards,
Gourav Sengupta

On Tue, Jan 16, 2018 at 12:20 AM, kant kodali <ka...@gmail.com> wrote:

> Hi,
>
> I am not sure I understand. any examples ?
>
> On Mon, Jan 15, 2018 at 3:45 PM, Gerard Maas <ge...@gmail.com>
> wrote:
>
>> Hi,
>>
>> You can monitor a filesystem directory as streaming source as long as the
>> files placed there are atomically copied/moved into the directory.
>> Updating the files is not supported.
>>
>> kr, Gerard.
>>
>> On Mon, Jan 15, 2018 at 11:41 PM, kant kodali <ka...@gmail.com> wrote:
>>
>>> Hi All,
>>>
>>> I am wondering if HDFS can be a streaming source like Kafka in Spark
>>> 2.2.0? For example can I have stream1 reading from Kafka and writing to
>>> HDFS and stream2 to read from HDFS and write it back to Kakfa ? such that
>>> stream2 will be pulling the latest updates written by stream1.
>>>
>>> Thanks!
>>>
>>
>>
>

Re: can HDFS be a streaming source like Kafka in Spark 2.2.0?

Posted by kant kodali <ka...@gmail.com>.

Hi,

I am not sure I understand. any examples ?

On Mon, Jan 15, 2018 at 3:45 PM, Gerard Maas <ge...@gmail.com> wrote:

> Hi,
>
> You can monitor a filesystem directory as streaming source as long as the
> files placed there are atomically copied/moved into the directory.
> Updating the files is not supported.
>
> kr, Gerard.
>
> On Mon, Jan 15, 2018 at 11:41 PM, kant kodali <ka...@gmail.com> wrote:
>
>> Hi All,
>>
>> I am wondering if HDFS can be a streaming source like Kafka in Spark
>> 2.2.0? For example can I have stream1 reading from Kafka and writing to
>> HDFS and stream2 to read from HDFS and write it back to Kakfa ? such that
>> stream2 will be pulling the latest updates written by stream1.
>>
>> Thanks!
>>
>
>

Re: can HDFS be a streaming source like Kafka in Spark 2.2.0?

Posted by Gerard Maas <ge...@gmail.com>.

Hi,

You can monitor a filesystem directory as streaming source as long as the
files placed there are atomically copied/moved into the directory.
Updating the files is not supported.

kr, Gerard.

On Mon, Jan 15, 2018 at 11:41 PM, kant kodali <ka...@gmail.com> wrote:

> Hi All,
>
> I am wondering if HDFS can be a streaming source like Kafka in Spark
> 2.2.0? For example can I have stream1 reading from Kafka and writing to
> HDFS and stream2 to read from HDFS and write it back to Kakfa ? such that
> stream2 will be pulling the latest updates written by stream1.
>
> Thanks!
>