You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flume.apache.org by Patrick Wendell <pw...@gmail.com> on 2012/11/01 05:47:30 UTC

Re: Review Request: FLUME-1425: Create a SpoolDirectory Source and Client


> On Oct. 30, 2012, 6:22 a.m., Mike Percy wrote:
> > flume-ng-core/src/main/java/org/apache/flume/client/avro/SpoolingFileLineReader.java, line 300
> > <https://reviews.apache.org/r/6377/diff/5/?file=178797#file178797line300>
> >
> >     This should be something like:
> >     
> >     int bufferSize = bufferMaxLines * bufferMaxLineLength;
> >     BufferedReader reader = new BufferedReader(new FileReader(nextFile), bufferSize);
> >     reader.mark(bufferSize);
> >     
> >     ...since the default buffer size in the BufferedReader constructor is 8124 in JDK 6 and that may not be enough to hold your bufferSize (or it may be too much). Also, the edge-case behavior of BufferedReader (w.r.t. mark and reset) is different in situations where the constructor arg is different than the arg passed to mark(), so rather than deal with that inconsistency it makes sense to do this.

Ya good catch.


> On Oct. 30, 2012, 6:22 a.m., Mike Percy wrote:
> > flume-ng-core/src/main/java/org/apache/flume/client/avro/SpoolingFileLineReader.java, line 188
> > <https://reviews.apache.org/r/6377/diff/5/?file=178797#file178797line188>
> >
> >     If the line length exceeds the length of the BufferedReader, this reset() call will throw a java.io.IOException: Mark invalid
> >     
> >     Also, committed = true when you hit this case, since committed = false does not get set until right before the return statement.
> >     
> >     Therefore, assuming the client catches the exception and recovers from it, the next call to readLines(n) will simply skip over those lines and the data will be lost.
> >     
> >     So if there is an error in which the line is too long, since we have decided that we going to fail then we should fail explicitly and permanently:
> >     1. Throw a FlumeException indicating a too-long line was seen (do not try to reset the reader) and also:
> >     2. Permanently disable the SpoolingFileLineReader object so that the next time someone tries to call any operation on it, such as readLines(), commit(), or close(), throw an IllegalStateException indicating that it is no longer in a usable state.
> >     
> >     In a future revision of this implementation, we can do something a little more graceful, but we have to ensure that we don't lose data in any case.

Yep this should stop the world when it fails... my mistake. Added some tests to make sure this works.


> On Oct. 30, 2012, 6:22 a.m., Mike Percy wrote:
> > flume-ng-core/src/test/java/org/apache/flume/client/avro/TestSpoolingFileLineReader.java, line 363
> > <https://reviews.apache.org/r/6377/diff/5/?file=178801#file178801line363>
> >
> >     This test is failing for me. Not sure why, haven't dug into it much yet.

I can't yet reproduce this failure.


- Patrick


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6377/#review12907
-----------------------------------------------------------


On Oct. 22, 2012, 8:36 p.m., Patrick Wendell wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/6377/
> -----------------------------------------------------------
> 
> (Updated Oct. 22, 2012, 8:36 p.m.)
> 
> 
> Review request for Flume.
> 
> 
> Description
> -------
> 
> This patch adds a spooling directory based source. The  idea is that a user can have a spool directory where files are deposited for ingestion into flume. Once ingested, the files are clearly renamed and the implementation guarantees at-least-once delivery semantics similar to those achieved within flume itself, even across failures and restarts of the JVM running the code.
> 
> This helps fill the gap for people who want a way to get reliable delivery of events into flume, but don't want to directly write their application against the flume API. They can simply drop log files off in a spooldir and let flume ingest asynchronously (using some shell scripts or other automated process).
> 
> Unlike the prior iteration, this patch implements a first-class source. It also extends the avro client to support spooling in a similar manner.
> 
> 
> This addresses bug FlUME-1425.
>     https://issues.apache.org/jira/browse/FlUME-1425
> 
> 
> Diffs
> -----
> 
>   flume-ng-configuration/src/main/java/org/apache/flume/conf/source/SourceConfiguration.java da804d7 
>   flume-ng-configuration/src/main/java/org/apache/flume/conf/source/SourceType.java abbbf1c 
>   flume-ng-core/src/main/java/org/apache/flume/client/avro/AvroCLIClient.java 4a5ecae 
>   flume-ng-core/src/main/java/org/apache/flume/client/avro/BufferedLineReader.java PRE-CREATION 
>   flume-ng-core/src/main/java/org/apache/flume/client/avro/LineReader.java PRE-CREATION 
>   flume-ng-core/src/main/java/org/apache/flume/client/avro/SpoolingFileLineReader.java PRE-CREATION 
>   flume-ng-core/src/main/java/org/apache/flume/source/SpoolDirectorySource.java PRE-CREATION 
>   flume-ng-core/src/main/java/org/apache/flume/source/SpoolDirectorySourceConfigurationConstants.java PRE-CREATION 
>   flume-ng-core/src/test/java/org/apache/flume/client/avro/TestBufferedLineReader.java PRE-CREATION 
>   flume-ng-core/src/test/java/org/apache/flume/client/avro/TestSpoolingFileLineReader.java PRE-CREATION 
>   flume-ng-core/src/test/java/org/apache/flume/source/TestSpoolDirectorySource.java PRE-CREATION 
>   flume-ng-doc/sphinx/FlumeUserGuide.rst 953a670 
> 
> Diff: https://reviews.apache.org/r/6377/diff/
> 
> 
> Testing
> -------
> 
> Extensive unit tests and I also built and played with this using a stub flume agent. If you look at the JIRA I have a configuration file for an agent that will print out Avro events to the command line - that's helpful when testing this.
> 
> 
> Thanks,
> 
> Patrick Wendell
> 
>