You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flume.apache.org by Otis Gospodnetic <ot...@gmail.com> on 2014/02/26 19:18:17 UTC

Does File Channel write first?

Hi,

Does Flume's File Channel write to disk right away?  Or only after it
attempts to send data to the Source? (e.g. if sending fails)

I think it's the former because Channel knows nothing about Source/sending
AFAIK, but am hoping for the latter. :)

Thanks,
Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/

Re: Does File Channel write first?

Posted by Otis Gospodnetic <ot...@gmail.com>.
Hi,

Thanks for the link, Hari!

It looks like the only way to avoid having Flume write data to be sent to
Sink on disk first is by using
https://issues.apache.org/jira/browse/FLUME-1227 , once it's committed.

I have a few related questions:

* How/when does Flume delete data from FileChannel?
* Does it delete individual "records" as soon as a "record" is sent out?
* Does it periodically purge batches of data?
* Is there a notion of TTL, like in Kafka, where data is not removed
explicitly by its consumer, but is deleted by Kafka Broker after some TTL?

* What happens with data that could not be sent?
* I know there is a retry and backoff mechanism.  But does Flume at some
point give up on trying to send some (old) piece of data out because it's
tried > N times or for > M seconds?

Thanks,
Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/


On Wed, Feb 26, 2014 at 2:15 PM, Hari Shreedharan <hshreedharan@cloudera.com
> wrote:

> File Channel is designed based on the Log Structured File System.
>
> Every time a source writes the event the event is written to disk too, and
> is really available to the sinks only when that transaction is committed.
>
> On this sink side, every take is written to the disk (each take simply has
> the file id and offset of the original event), but the events are garbage
> collected if and only the transaction is committed.
>
> Also, only commits we actually fsync to disk. You can see more details of
> the design here:
> https://blogs.apache.org/flume/entry/apache_flume_filechannel
>
>
> Hari
>
>
> On Wed, Feb 26, 2014 at 10:18 AM, Otis Gospodnetic <
> otis.gospodnetic@gmail.com> wrote:
>
>> Hi,
>>
>> Does Flume's File Channel write to disk right away?  Or only after it
>> attempts to send data to the Source? (e.g. if sending fails)
>>
>> I think it's the former because Channel knows nothing about
>> Source/sending AFAIK, but am hoping for the latter. :)
>>
>> Thanks,
>> Otis
>> --
>> Performance Monitoring * Log Analytics * Search Analytics
>> Solr & Elasticsearch Support * http://sematext.com/
>>
>>
>

Re: Does File Channel write first?

Posted by Hari Shreedharan <hs...@cloudera.com>.
File Channel is designed based on the Log Structured File System.

Every time a source writes the event the event is written to disk too, and
is really available to the sinks only when that transaction is committed.

On this sink side, every take is written to the disk (each take simply has
the file id and offset of the original event), but the events are garbage
collected if and only the transaction is committed.

Also, only commits we actually fsync to disk. You can see more details of
the design here:
https://blogs.apache.org/flume/entry/apache_flume_filechannel


Hari


On Wed, Feb 26, 2014 at 10:18 AM, Otis Gospodnetic <
otis.gospodnetic@gmail.com> wrote:

> Hi,
>
> Does Flume's File Channel write to disk right away?  Or only after it
> attempts to send data to the Source? (e.g. if sending fails)
>
> I think it's the former because Channel knows nothing about Source/sending
> AFAIK, but am hoping for the latter. :)
>
> Thanks,
> Otis
> --
> Performance Monitoring * Log Analytics * Search Analytics
> Solr & Elasticsearch Support * http://sematext.com/
>
>