You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by Jakes John <ja...@gmail.com> on 2016/06/29 00:45:11 UTC

Storm HDFS multiple writers

Hi,
      I would like to know how does the storm HDFS bolt works?  As far as I
read about HDFS, it doesn't support multiple writer concurrency on a file.
ie, Multiple writers cannot write to the same file at the same time. Then,
how does it work when multiple bolts try to write to a file in the HDFS at
the same time? I read somewhere that multiple files are created but merged
at last. Is it true? who is doing the merging?

Thanks,
Jakes

Re: Storm HDFS multiple writers

Posted by Aaron Niskodé-Dossett <do...@gmail.com>.
Yes, the latency would prevent writing every event to HDFS as its received.
However, the bolt does provide several tunable options for batching writes
and syncs to HDFS, please see the README file in external/storm-hdfs for
more details.  Between those options and the ability to parallelize the
bolt, you should be able to reach something quite acceptable, especially if
your use case is cold storage.  I've used the bolt in many different
circumstances and always been able to get about 1 second of latency.

Best, Aaron

On Tue, Jun 28, 2016 at 8:05 PM Jakes John <ja...@gmail.com> wrote:

> Thanks for the response. I read that HDFS is good for only long streaming
> writes for getting the highest throughput. And latency is huge. In this
> case for storm, messages are very small. Will this affect the throughput of
> the system ? Have you seen any other issues with storm-hdfs because of
> speed mismatch ?
>
> Has storm community suggest something else for the cold storage of the
> streaming data?
>
> On Tue, Jun 28, 2016 at 5:48 PM, Aaron Niskodé-Dossett <do...@gmail.com>
> wrote:
>
>> No, files are not merged. In general you specify a directory and a file
>> naming convention but each writer writes to its own file.
>>
>> On Tue, Jun 28, 2016 at 7:45 PM Jakes John <ja...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>       I would like to know how does the storm HDFS bolt works?  As far
>>> as I read about HDFS, it doesn't support multiple writer concurrency on a
>>> file. ie, Multiple writers cannot write to the same file at the same time.
>>> Then, how does it work when multiple bolts try to write to a file in the
>>> HDFS at the same time? I read somewhere that multiple files are created but
>>> merged at last. Is it true? who is doing the merging?
>>>
>>> Thanks,
>>> Jakes
>>>
>>
>

Re: Storm HDFS multiple writers

Posted by Jakes John <ja...@gmail.com>.
Thanks for the response. I read that HDFS is good for only long streaming
writes for getting the highest throughput. And latency is huge. In this
case for storm, messages are very small. Will this affect the throughput of
the system ? Have you seen any other issues with storm-hdfs because of
speed mismatch ?

Has storm community suggest something else for the cold storage of the
streaming data?

On Tue, Jun 28, 2016 at 5:48 PM, Aaron Niskodé-Dossett <do...@gmail.com>
wrote:

> No, files are not merged. In general you specify a directory and a file
> naming convention but each writer writes to its own file.
>
> On Tue, Jun 28, 2016 at 7:45 PM Jakes John <ja...@gmail.com>
> wrote:
>
>> Hi,
>>       I would like to know how does the storm HDFS bolt works?  As far as
>> I read about HDFS, it doesn't support multiple writer concurrency on a
>> file. ie, Multiple writers cannot write to the same file at the same time.
>> Then, how does it work when multiple bolts try to write to a file in the
>> HDFS at the same time? I read somewhere that multiple files are created but
>> merged at last. Is it true? who is doing the merging?
>>
>> Thanks,
>> Jakes
>>
>

Re: Storm HDFS multiple writers

Posted by Aaron Niskodé-Dossett <do...@gmail.com>.
No, files are not merged. In general you specify a directory and a file
naming convention but each writer writes to its own file.

On Tue, Jun 28, 2016 at 7:45 PM Jakes John <ja...@gmail.com> wrote:

> Hi,
>       I would like to know how does the storm HDFS bolt works?  As far as
> I read about HDFS, it doesn't support multiple writer concurrency on a
> file. ie, Multiple writers cannot write to the same file at the same time.
> Then, how does it work when multiple bolts try to write to a file in the
> HDFS at the same time? I read somewhere that multiple files are created but
> merged at last. Is it true? who is doing the merging?
>
> Thanks,
> Jakes
>