You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flume.apache.org by Guillermo Ortiz <ko...@gmail.com> on 2014/03/02 22:01:41 UTC

What do people think about the patch FLUME-2321?

Hi,

I did a new feature for Flume
(FLUME-2321<https://issues.apache.org/jira/browse/FLUME-2321>),
I'd like to know what people think about it and how it's the mechanism to
be accepted a new feature
It's first time that I collaborate with an Apache Project and I don't
really know how it works. Or maybe it's because nobody is interested on it,
hehe.

On another hand, I'm coding a new "tail" source, and I don't want to get
the same mistakes in the future.

Thank you,

Guillermo Ortiz.

Re: What do people think about the patch FLUME-2321?

Posted by gortiz <go...@pragsis.com>.
Hi,

It's an available patch with this from yesterday, FLUME-2344-0.patch.
If I upgrade the version of Java, people with Java6 couldn't use Flume 
and there's a lot of people with Java6.

It could be possible to read old logs file with the their date, if the 
patch is accepted, I'll think about that little improve.

Thank you.

On 11/03/14 19:01, Otis Gospodnetic wrote:
> Hi,
>
> On Tue, Mar 11, 2014 at 4:05 AM, gortiz <go...@pragsis.com> wrote:
>
>> Right.  Flume will miss any data that was logged while it was down because
>> Flume simply uses tail -F with ExecSource.
>>
>> *Your implementation remembers the last file (inode?) it tailed + position
>> in that file?*
>> -It remembers only the last rotate, if the file rotates more than once,
>> you'll lose data. I couldn't use inodes because it's neccesary Java7, Flume
>> is developed in Java6, so I save the date of last modification to control
>> the last rotate file and the offset until I read last time.
>>
> Is Java 6 support really necessary?  Java 6 is EOL.  Java 8 release is
> imminent, I believe.
> If tracking inode vs. last mod date brings in some advantage, I'd consider
> switching to that.
>
> *  What happens when multiple log files are rotated while Flume agent was
>> down?  Does your implementation know how to:
>> 1) read the last tailed file from where it stopped all the way to the end
>> 2) read all files that were completely missed from beginning to the end
>> 3) start tailing the "active" log file*
>>
>
>> -I do case number 1 and half of case 2. When it ended to read the rotate
>> file, it starts to read the new file. So it could read XXX.log.1 and when
>> it ends, will continue with XXX.log. It was outside of our scope to read
>> all the rotate file, so it lost XXX.log.2 if it wasn't read.
>> if you want to could do case 3 as well changing manually the
>> XXX.checkpoint, Here, I save the information about file which we're
>> watching, the last offset you read and the last date of last rotation.
>>
> If you save the date when you last read from a file, then I think you could
> just could finish reading the file you were last reading and then look at
> all other files and read all files with newer last mod date, beginning to
> end. No?
>
> *Assuming yes, yes, and yes, can one configure:
>> A) if 3) should start happening right away (while 1) and 2) are happening
>> "in the background)
>> B) or whether 1), 2), and 3) should happen sequentially**
>> *B case with restrictions I said.
>>
>>
>>   *The A) use case is very handy when the most recent data is much more
>> valuable than old data (e.g. performance metrics) and thus you'd rather
>> start sending new data first and backfill old data later (or in parallel).*
>> -The point it that I need to get the data sorted by date and I assume that
>> if we have our server down
>> too much time for that files could rotate more than once, we are too lazy.
>> So, I just tried to solve
>> doesn't lose data when there's short offset and the problem with tail from
>> linux. I really like your suggestion
>> number 2, I could think about it, but in the future.
>>
> +1 for the simple version now and more sophisticated later.
>
>> *
>> Have you compared your approach+impl with
>> http://commons.apache.org/proper/commons-io/apidocs/org/
>> apache/commons/io/input/Tailer.html?
>> http://grepcode.com/file/repo1.maven.org/maven2/
>> commons-io/commons-io/2.4/org/apache/commons/io/input/Tailer.java
>> *They really call to tail unix command, so, it has the same problem that
>> Source which you can exec a tail.
>>
> Ahaaa, I did not realize this!
>
>
>> About the problem with tail from UNIX, I'm not pretty sure because I
>> didn't check that code, but I read somewhere that it was a patch about
>>   that command
>> where could happen that it could lose data which wasn't apply some
>> distribution or something like that. I should look for more information
>> about it.
>>
> I don't know anything about this....
>
> Is this project available anywhere?
> Would it make sense to add a new Flume Source that uses it?
>
> Thanks,
> Otis
> --
> Performance Monitoring * Log Analytics * Search Analytics
> Solr & Elasticsearch Support * http://sematext.com/
>
>
>
>>
>>
>> On 10/03/14 20:50, Otis Gospodnetic wrote:
>>
>>> Hi,
>>>
>>> On Mon, Mar 10, 2014 at 3:35 AM, gortiz <go...@pragsis.com> wrote:
>>>
>>>   Hi,
>>>> About the tailing, I was checking the code of the tail from Linux and
>>>> there's some chance to lost data when the file rotates.
>>>>
>>>>   In case of Linux's own tail?
>>>
>>>   Plus, if Flume is stopped, there's not chance to recover the data when it
>>>> isn't getting the data. I have implemented and checkpoint mechanism to
>>>> recover the most data as possible is this happen.
>>>>
>>>>   Right.  Flume will miss any data that was logged while it was down
>>> because
>>> Flume simply uses tail -F with ExecSource.
>>>
>>> Your implementation remembers the last file (inode?) it tailed + position
>>> in that file?
>>>
>>> What happens when multiple log files are rotated while Flume agent was
>>> down?  Does your implementation know how to:
>>> 1) read the last tailed file from where it stopped all the way to the end
>>> 2) read all files that were completely missed from beginning to the end
>>> 3) start tailing the "active" log file
>>>
>>> Assuming yes, yes, and yes, can one configure:
>>> A) if 3) should start happening right away (while 1) and 2) are happening
>>> "in the background)
>>> B) or whether 1), 2), and 3) should happen sequentially
>>>
>>> The A) use case is very handy when the most recent data is much more
>>> valuable than old data (e.g. performance metrics) and thus you'd rather
>>> start sending new data first and backfill old data later (or in parallel).
>>>
>>> Have you compared your approach+impl with
>>> http://commons.apache.org/proper/commons-io/apidocs/org/
>>> apache/commons/io/input/Tailer.html?
>>>
>>> http://grepcode.com/file/repo1.maven.org/maven2/
>>> commons-io/commons-io/2.4/org/apache/commons/io/input/Tailer.java
>>>
>>> Thanks,
>>> Otis
>>>
>>>
>>>
>>>   I think that Tailing for Flume is good enough it you're not worry to lose
>>>> any data, but it I needed to improve a little bit this feature.
>>>>
>>>> If you have more question, let me know.
>>>>
>>>> Guillermo Ortiz.
>>>>
>>>> On 07/03/14 21:47, Otis Gospodnetic wrote:
>>>>
>>>>   Hi Guillermo,
>>>>> I don't have the need for FLUME-2321, but maybe one of the devs can
>>>>> have a
>>>>> look.
>>>>>
>>>>> I am curious about that new tail source you mentioned, though.  Can you
>>>>> tell us more about what you are working on, how it is going to work, and
>>>>> how it will be better than the tailer form Apache Commons and ExecSource
>>>>> with tail -F ?
>>>>>
>>>>> Thanks,
>>>>> Otis
>>>>> --
>>>>> Performance Monitoring * Log Analytics * Search Analytics
>>>>> Solr & Elasticsearch Support * http://sematext.com/
>>>>>
>>>>>
>>>>> On Sun, Mar 2, 2014 at 4:01 PM, Guillermo Ortiz <konstt2000@gmail.com
>>>>>
>>>>>> wrote:
>>>>>>
>>>>>    Hi,
>>>>>
>>>>>> I did a new feature for Flume
>>>>>> (FLUME-2321<https://issues.apache.org/jira/browse/FLUME-2321>),
>>>>>> I'd like to know what people think about it and how it's the mechanism
>>>>>> to
>>>>>> be accepted a new feature
>>>>>> It's first time that I collaborate with an Apache Project and I don't
>>>>>> really know how it works. Or maybe it's because nobody is interested on
>>>>>> it,
>>>>>> hehe.
>>>>>>
>>>>>> On another hand, I'm coding a new "tail" source, and I don't want to
>>>>>> get
>>>>>> the same mistakes in the future.
>>>>>>
>>>>>> Thank you,
>>>>>>
>>>>>> Guillermo Ortiz.
>>>>>>
>>>>>>
>>>>>>   --
>>>> *Guillermo Ortiz*
>>>> /Big Data Developer/
>>>>
>>>> Telf.: +34 917 680 490
>>>> Fax: +34 913 833 301
>>>> C/ Manuel Tovar, 49-53 - 28034 Madrid - Spain
>>>>
>>>> _http://www.bidoop.es_
>>>>
>>>>
>>>>
>> --
>> *Guillermo Ortiz*
>> /Big Data Developer/
>>
>> Telf.: +34 917 680 490
>> Fax: +34 913 833 301
>> C/ Manuel Tovar, 49-53 - 28034 Madrid - Spain
>>
>> _http://www.bidoop.es_
>>
>>


-- 
*Guillermo Ortiz*
/Big Data Developer/

Telf.: +34 917 680 490
Fax: +34 913 833 301
C/ Manuel Tovar, 49-53 - 28034 Madrid - Spain

_http://www.bidoop.es_


Re: What do people think about the patch FLUME-2321?

Posted by Otis Gospodnetic <ot...@gmail.com>.
Hi,

On Tue, Mar 11, 2014 at 4:05 AM, gortiz <go...@pragsis.com> wrote:

> Right.  Flume will miss any data that was logged while it was down because
> Flume simply uses tail -F with ExecSource.
>
> *Your implementation remembers the last file (inode?) it tailed + position
> in that file?*
> -It remembers only the last rotate, if the file rotates more than once,
> you'll lose data. I couldn't use inodes because it's neccesary Java7, Flume
> is developed in Java6, so I save the date of last modification to control
> the last rotate file and the offset until I read last time.
>

Is Java 6 support really necessary?  Java 6 is EOL.  Java 8 release is
imminent, I believe.
If tracking inode vs. last mod date brings in some advantage, I'd consider
switching to that.

*  What happens when multiple log files are rotated while Flume agent was
> down?  Does your implementation know how to:
> 1) read the last tailed file from where it stopped all the way to the end
> 2) read all files that were completely missed from beginning to the end
> 3) start tailing the "active" log file*
>


> -I do case number 1 and half of case 2. When it ended to read the rotate
> file, it starts to read the new file. So it could read XXX.log.1 and when
> it ends, will continue with XXX.log. It was outside of our scope to read
> all the rotate file, so it lost XXX.log.2 if it wasn't read.
> if you want to could do case 3 as well changing manually the
> XXX.checkpoint, Here, I save the information about file which we're
> watching, the last offset you read and the last date of last rotation.
>

If you save the date when you last read from a file, then I think you could
just could finish reading the file you were last reading and then look at
all other files and read all files with newer last mod date, beginning to
end. No?

*Assuming yes, yes, and yes, can one configure:
> A) if 3) should start happening right away (while 1) and 2) are happening
> "in the background)
> B) or whether 1), 2), and 3) should happen sequentially**
> *B case with restrictions I said.
>
>
>  *The A) use case is very handy when the most recent data is much more
> valuable than old data (e.g. performance metrics) and thus you'd rather
> start sending new data first and backfill old data later (or in parallel).*
> -The point it that I need to get the data sorted by date and I assume that
> if we have our server down
> too much time for that files could rotate more than once, we are too lazy.
> So, I just tried to solve
> doesn't lose data when there's short offset and the problem with tail from
> linux. I really like your suggestion
> number 2, I could think about it, but in the future.
>

+1 for the simple version now and more sophisticated later.

>
> *
> Have you compared your approach+impl with
> http://commons.apache.org/proper/commons-io/apidocs/org/
> apache/commons/io/input/Tailer.html?
> http://grepcode.com/file/repo1.maven.org/maven2/
> commons-io/commons-io/2.4/org/apache/commons/io/input/Tailer.java
> *They really call to tail unix command, so, it has the same problem that
> Source which you can exec a tail.
>

Ahaaa, I did not realize this!


> About the problem with tail from UNIX, I'm not pretty sure because I
> didn't check that code, but I read somewhere that it was a patch about
>  that command
> where could happen that it could lose data which wasn't apply some
> distribution or something like that. I should look for more information
> about it.
>

I don't know anything about this....

Is this project available anywhere?
Would it make sense to add a new Flume Source that uses it?

Thanks,
Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/



>
>
>
> On 10/03/14 20:50, Otis Gospodnetic wrote:
>
>> Hi,
>>
>> On Mon, Mar 10, 2014 at 3:35 AM, gortiz <go...@pragsis.com> wrote:
>>
>>  Hi,
>>>
>>> About the tailing, I was checking the code of the tail from Linux and
>>> there's some chance to lost data when the file rotates.
>>>
>>>  In case of Linux's own tail?
>>
>>
>>  Plus, if Flume is stopped, there's not chance to recover the data when it
>>> isn't getting the data. I have implemented and checkpoint mechanism to
>>> recover the most data as possible is this happen.
>>>
>>>  Right.  Flume will miss any data that was logged while it was down
>> because
>> Flume simply uses tail -F with ExecSource.
>>
>> Your implementation remembers the last file (inode?) it tailed + position
>> in that file?
>>
>> What happens when multiple log files are rotated while Flume agent was
>> down?  Does your implementation know how to:
>> 1) read the last tailed file from where it stopped all the way to the end
>> 2) read all files that were completely missed from beginning to the end
>> 3) start tailing the "active" log file
>>
>> Assuming yes, yes, and yes, can one configure:
>> A) if 3) should start happening right away (while 1) and 2) are happening
>> "in the background)
>> B) or whether 1), 2), and 3) should happen sequentially
>>
>> The A) use case is very handy when the most recent data is much more
>> valuable than old data (e.g. performance metrics) and thus you'd rather
>> start sending new data first and backfill old data later (or in parallel).
>>
>> Have you compared your approach+impl with
>> http://commons.apache.org/proper/commons-io/apidocs/org/
>> apache/commons/io/input/Tailer.html?
>>
>> http://grepcode.com/file/repo1.maven.org/maven2/
>> commons-io/commons-io/2.4/org/apache/commons/io/input/Tailer.java
>>
>> Thanks,
>> Otis
>>
>>
>>
>>  I think that Tailing for Flume is good enough it you're not worry to lose
>>> any data, but it I needed to improve a little bit this feature.
>>>
>>> If you have more question, let me know.
>>>
>>> Guillermo Ortiz.
>>>
>>> On 07/03/14 21:47, Otis Gospodnetic wrote:
>>>
>>>  Hi Guillermo,
>>>>
>>>> I don't have the need for FLUME-2321, but maybe one of the devs can
>>>> have a
>>>> look.
>>>>
>>>> I am curious about that new tail source you mentioned, though.  Can you
>>>> tell us more about what you are working on, how it is going to work, and
>>>> how it will be better than the tailer form Apache Commons and ExecSource
>>>> with tail -F ?
>>>>
>>>> Thanks,
>>>> Otis
>>>> --
>>>> Performance Monitoring * Log Analytics * Search Analytics
>>>> Solr & Elasticsearch Support * http://sematext.com/
>>>>
>>>>
>>>> On Sun, Mar 2, 2014 at 4:01 PM, Guillermo Ortiz <konstt2000@gmail.com
>>>>
>>>>> wrote:
>>>>>
>>>>   Hi,
>>>>
>>>>> I did a new feature for Flume
>>>>> (FLUME-2321<https://issues.apache.org/jira/browse/FLUME-2321>),
>>>>> I'd like to know what people think about it and how it's the mechanism
>>>>> to
>>>>> be accepted a new feature
>>>>> It's first time that I collaborate with an Apache Project and I don't
>>>>> really know how it works. Or maybe it's because nobody is interested on
>>>>> it,
>>>>> hehe.
>>>>>
>>>>> On another hand, I'm coding a new "tail" source, and I don't want to
>>>>> get
>>>>> the same mistakes in the future.
>>>>>
>>>>> Thank you,
>>>>>
>>>>> Guillermo Ortiz.
>>>>>
>>>>>
>>>>>  --
>>> *Guillermo Ortiz*
>>> /Big Data Developer/
>>>
>>> Telf.: +34 917 680 490
>>> Fax: +34 913 833 301
>>> C/ Manuel Tovar, 49-53 - 28034 Madrid - Spain
>>>
>>> _http://www.bidoop.es_
>>>
>>>
>>>
>
> --
> *Guillermo Ortiz*
> /Big Data Developer/
>
> Telf.: +34 917 680 490
> Fax: +34 913 833 301
> C/ Manuel Tovar, 49-53 - 28034 Madrid - Spain
>
> _http://www.bidoop.es_
>
>

Re: What do people think about the patch FLUME-2321?

Posted by gortiz <go...@pragsis.com>.
Right.  Flume will miss any data that was logged while it was down because
Flume simply uses tail -F with ExecSource.

*Your implementation remembers the last file (inode?) it tailed + position
in that file?*
-It remembers only the last rotate, if the file rotates more than once, you'll lose data. I couldn't use inodes because it's neccesary Java7, Flume is developed in Java6, so I save the date of last modification to control the last rotate file and the offset until I read last time.


*  What happens when multiple log files are rotated while Flume agent was
down?  Does your implementation know how to:
1) read the last tailed file from where it stopped all the way to the end
2) read all files that were completely missed from beginning to the end
3) start tailing the "active" log file*
-I do case number 1 and half of case 2. When it ended to read the rotate file, it starts to read the new file. So it could read XXX.log.1 and when it ends, will continue with XXX.log. It was outside of our scope to read all the rotate file, so it lost XXX.log.2 if it wasn't read.
if you want to could do case 3 as well changing manually the XXX.checkpoint, Here, I save the information about file which we're watching, the last offset you read and the last date of last rotation.


*Assuming yes, yes, and yes, can one configure:
A) if 3) should start happening right away (while 1) and 2) are happening
"in the background)
B) or whether 1), 2), and 3) should happen sequentially**
*B case with restrictions I said.


  *The A) use case is very handy when the most recent data is much more
valuable than old data (e.g. performance metrics) and thus you'd rather
start sending new data first and backfill old data later (or in parallel).*
-The point it that I need to get the data sorted by date and I assume that if we have our server down
too much time for that files could rotate more than once, we are too lazy. So, I just tried to solve
doesn't lose data when there's short offset and the problem with tail from linux. I really like your suggestion
number 2, I could think about it, but in the future.

*
Have you compared your approach+impl with
http://commons.apache.org/proper/commons-io/apidocs/org/apache/commons/io/input/Tailer.html?
http://grepcode.com/file/repo1.maven.org/maven2/commons-io/commons-io/2.4/org/apache/commons/io/input/Tailer.java
*They really call to tail unix command, so, it has the same problem that Source which you can exec a tail.

About the problem with tail from UNIX, I'm not pretty sure because I didn't check that code, but I read somewhere that it was a patch about  that command
where could happen that it could lose data which wasn't apply some distribution or something like that. I should look for more information about it.



On 10/03/14 20:50, Otis Gospodnetic wrote:
> Hi,
>
> On Mon, Mar 10, 2014 at 3:35 AM, gortiz <go...@pragsis.com> wrote:
>
>> Hi,
>>
>> About the tailing, I was checking the code of the tail from Linux and
>> there's some chance to lost data when the file rotates.
>>
> In case of Linux's own tail?
>
>
>> Plus, if Flume is stopped, there's not chance to recover the data when it
>> isn't getting the data. I have implemented and checkpoint mechanism to
>> recover the most data as possible is this happen.
>>
> Right.  Flume will miss any data that was logged while it was down because
> Flume simply uses tail -F with ExecSource.
>
> Your implementation remembers the last file (inode?) it tailed + position
> in that file?
>
> What happens when multiple log files are rotated while Flume agent was
> down?  Does your implementation know how to:
> 1) read the last tailed file from where it stopped all the way to the end
> 2) read all files that were completely missed from beginning to the end
> 3) start tailing the "active" log file
>
> Assuming yes, yes, and yes, can one configure:
> A) if 3) should start happening right away (while 1) and 2) are happening
> "in the background)
> B) or whether 1), 2), and 3) should happen sequentially
>
> The A) use case is very handy when the most recent data is much more
> valuable than old data (e.g. performance metrics) and thus you'd rather
> start sending new data first and backfill old data later (or in parallel).
>
> Have you compared your approach+impl with
> http://commons.apache.org/proper/commons-io/apidocs/org/apache/commons/io/input/Tailer.html?
>
> http://grepcode.com/file/repo1.maven.org/maven2/commons-io/commons-io/2.4/org/apache/commons/io/input/Tailer.java
>
> Thanks,
> Otis
>
>
>
>> I think that Tailing for Flume is good enough it you're not worry to lose
>> any data, but it I needed to improve a little bit this feature.
>>
>> If you have more question, let me know.
>>
>> Guillermo Ortiz.
>>
>> On 07/03/14 21:47, Otis Gospodnetic wrote:
>>
>>> Hi Guillermo,
>>>
>>> I don't have the need for FLUME-2321, but maybe one of the devs can have a
>>> look.
>>>
>>> I am curious about that new tail source you mentioned, though.  Can you
>>> tell us more about what you are working on, how it is going to work, and
>>> how it will be better than the tailer form Apache Commons and ExecSource
>>> with tail -F ?
>>>
>>> Thanks,
>>> Otis
>>> --
>>> Performance Monitoring * Log Analytics * Search Analytics
>>> Solr & Elasticsearch Support * http://sematext.com/
>>>
>>>
>>> On Sun, Mar 2, 2014 at 4:01 PM, Guillermo Ortiz <konstt2000@gmail.com
>>>> wrote:
>>>   Hi,
>>>> I did a new feature for Flume
>>>> (FLUME-2321<https://issues.apache.org/jira/browse/FLUME-2321>),
>>>> I'd like to know what people think about it and how it's the mechanism to
>>>> be accepted a new feature
>>>> It's first time that I collaborate with an Apache Project and I don't
>>>> really know how it works. Or maybe it's because nobody is interested on
>>>> it,
>>>> hehe.
>>>>
>>>> On another hand, I'm coding a new "tail" source, and I don't want to get
>>>> the same mistakes in the future.
>>>>
>>>> Thank you,
>>>>
>>>> Guillermo Ortiz.
>>>>
>>>>
>> --
>> *Guillermo Ortiz*
>> /Big Data Developer/
>>
>> Telf.: +34 917 680 490
>> Fax: +34 913 833 301
>> C/ Manuel Tovar, 49-53 - 28034 Madrid - Spain
>>
>> _http://www.bidoop.es_
>>
>>


-- 
*Guillermo Ortiz*
/Big Data Developer/

Telf.: +34 917 680 490
Fax: +34 913 833 301
C/ Manuel Tovar, 49-53 - 28034 Madrid - Spain

_http://www.bidoop.es_


Re: What do people think about the patch FLUME-2321?

Posted by Otis Gospodnetic <ot...@gmail.com>.
Hi,

On Mon, Mar 10, 2014 at 3:35 AM, gortiz <go...@pragsis.com> wrote:

> Hi,
>
> About the tailing, I was checking the code of the tail from Linux and
> there's some chance to lost data when the file rotates.
>

In case of Linux's own tail?


> Plus, if Flume is stopped, there's not chance to recover the data when it
> isn't getting the data. I have implemented and checkpoint mechanism to
> recover the most data as possible is this happen.
>

Right.  Flume will miss any data that was logged while it was down because
Flume simply uses tail -F with ExecSource.

Your implementation remembers the last file (inode?) it tailed + position
in that file?

What happens when multiple log files are rotated while Flume agent was
down?  Does your implementation know how to:
1) read the last tailed file from where it stopped all the way to the end
2) read all files that were completely missed from beginning to the end
3) start tailing the "active" log file

Assuming yes, yes, and yes, can one configure:
A) if 3) should start happening right away (while 1) and 2) are happening
"in the background)
B) or whether 1), 2), and 3) should happen sequentially

The A) use case is very handy when the most recent data is much more
valuable than old data (e.g. performance metrics) and thus you'd rather
start sending new data first and backfill old data later (or in parallel).

Have you compared your approach+impl with
http://commons.apache.org/proper/commons-io/apidocs/org/apache/commons/io/input/Tailer.html?

http://grepcode.com/file/repo1.maven.org/maven2/commons-io/commons-io/2.4/org/apache/commons/io/input/Tailer.java

Thanks,
Otis



>
> I think that Tailing for Flume is good enough it you're not worry to lose
> any data, but it I needed to improve a little bit this feature.
>
> If you have more question, let me know.
>
> Guillermo Ortiz.
>
> On 07/03/14 21:47, Otis Gospodnetic wrote:
>
>> Hi Guillermo,
>>
>> I don't have the need for FLUME-2321, but maybe one of the devs can have a
>> look.
>>
>> I am curious about that new tail source you mentioned, though.  Can you
>> tell us more about what you are working on, how it is going to work, and
>> how it will be better than the tailer form Apache Commons and ExecSource
>> with tail -F ?
>>
>> Thanks,
>> Otis
>> --
>> Performance Monitoring * Log Analytics * Search Analytics
>> Solr & Elasticsearch Support * http://sematext.com/
>>
>>
>> On Sun, Mar 2, 2014 at 4:01 PM, Guillermo Ortiz <konstt2000@gmail.com
>> >wrote:
>>
>>  Hi,
>>>
>>> I did a new feature for Flume
>>> (FLUME-2321<https://issues.apache.org/jira/browse/FLUME-2321>),
>>> I'd like to know what people think about it and how it's the mechanism to
>>> be accepted a new feature
>>> It's first time that I collaborate with an Apache Project and I don't
>>> really know how it works. Or maybe it's because nobody is interested on
>>> it,
>>> hehe.
>>>
>>> On another hand, I'm coding a new "tail" source, and I don't want to get
>>> the same mistakes in the future.
>>>
>>> Thank you,
>>>
>>> Guillermo Ortiz.
>>>
>>>
>
> --
> *Guillermo Ortiz*
> /Big Data Developer/
>
> Telf.: +34 917 680 490
> Fax: +34 913 833 301
> C/ Manuel Tovar, 49-53 - 28034 Madrid - Spain
>
> _http://www.bidoop.es_
>
>

Re: What do people think about the patch FLUME-2321?

Posted by gortiz <go...@pragsis.com>.
Hi,

About the tailing, I was checking the code of the tail from Linux and 
there's some chance to lost data when the file rotates.
Plus, if Flume is stopped, there's not chance to recover the data when 
it isn't getting the data. I have implemented and checkpoint mechanism 
to recover the most data as possible is this happen.

I think that Tailing for Flume is good enough it you're not worry to 
lose any data, but it I needed to improve a little bit this feature.

If you have more question, let me know.

Guillermo Ortiz.

On 07/03/14 21:47, Otis Gospodnetic wrote:
> Hi Guillermo,
>
> I don't have the need for FLUME-2321, but maybe one of the devs can have a
> look.
>
> I am curious about that new tail source you mentioned, though.  Can you
> tell us more about what you are working on, how it is going to work, and
> how it will be better than the tailer form Apache Commons and ExecSource
> with tail -F ?
>
> Thanks,
> Otis
> --
> Performance Monitoring * Log Analytics * Search Analytics
> Solr & Elasticsearch Support * http://sematext.com/
>
>
> On Sun, Mar 2, 2014 at 4:01 PM, Guillermo Ortiz <ko...@gmail.com>wrote:
>
>> Hi,
>>
>> I did a new feature for Flume
>> (FLUME-2321<https://issues.apache.org/jira/browse/FLUME-2321>),
>> I'd like to know what people think about it and how it's the mechanism to
>> be accepted a new feature
>> It's first time that I collaborate with an Apache Project and I don't
>> really know how it works. Or maybe it's because nobody is interested on it,
>> hehe.
>>
>> On another hand, I'm coding a new "tail" source, and I don't want to get
>> the same mistakes in the future.
>>
>> Thank you,
>>
>> Guillermo Ortiz.
>>


-- 
*Guillermo Ortiz*
/Big Data Developer/

Telf.: +34 917 680 490
Fax: +34 913 833 301
C/ Manuel Tovar, 49-53 - 28034 Madrid - Spain

_http://www.bidoop.es_


Re: What do people think about the patch FLUME-2321?

Posted by Otis Gospodnetic <ot...@gmail.com>.
Hi Guillermo,

I don't have the need for FLUME-2321, but maybe one of the devs can have a
look.

I am curious about that new tail source you mentioned, though.  Can you
tell us more about what you are working on, how it is going to work, and
how it will be better than the tailer form Apache Commons and ExecSource
with tail -F ?

Thanks,
Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/


On Sun, Mar 2, 2014 at 4:01 PM, Guillermo Ortiz <ko...@gmail.com>wrote:

> Hi,
>
> I did a new feature for Flume
> (FLUME-2321<https://issues.apache.org/jira/browse/FLUME-2321>),
> I'd like to know what people think about it and how it's the mechanism to
> be accepted a new feature
> It's first time that I collaborate with an Apache Project and I don't
> really know how it works. Or maybe it's because nobody is interested on it,
> hehe.
>
> On another hand, I'm coding a new "tail" source, and I don't want to get
> the same mistakes in the future.
>
> Thank you,
>
> Guillermo Ortiz.
>