You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flume.apache.org by "Mike Percy (JIRA)" <ji...@apache.org> on 2012/11/02 10:35:12 UTC

[jira] [Commented] (FLUME-1425) Create a SpoolDirectory Source and Client

    [ https://issues.apache.org/jira/browse/FLUME-1425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13489333#comment-13489333 ] 

Mike Percy commented on FLUME-1425:
-----------------------------------

[~drizzt321]: Would love to get enhancements on top of this work after it's committed. You may want to file a JIRA for that.

[~pwendell@gmail.com]: I am still getting a unit test error on my Mac. I'll try to dig into it more tomorrow. This is the stack trace:

{noformat}
-------------------------------------------------------------------------------
Test set: org.apache.flume.client.avro.TestSpoolingFileLineReader
-------------------------------------------------------------------------------
Tests run: 16, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.241 sec <<< FAILURE!
testBehaviorWithEmptyFile(org.apache.flume.client.avro.TestSpoolingFileLineReader)  Time elapsed: 0.007 sec  <<< FAILURE!
java.lang.AssertionError
  at org.junit.Assert.fail(Assert.java:92)
  at org.junit.Assert.assertTrue(Assert.java:43)
  at org.junit.Assert.assertTrue(Assert.java:54)
  at org.apache.flume.client.avro.TestSpoolingFileLineReader.testBehaviorWithEmptyFile(TestSpoolingFileLineReader.java:396)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
  at java.lang.reflect.Method.invoke(Method.java:597)
  at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
  at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
  at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
  at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
  at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
  at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:30)
  at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263)
  at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68)
  at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47)
  at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)
  at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60)
  at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229)
  at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50)
  at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222)
  at org.junit.runners.ParentRunner.run(ParentRunner.java:300)
  at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:236)
  at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:134)
  at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:113)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
  at java.lang.reflect.Method.invoke(Method.java:597)
  at org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189)
  at org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165)
  at org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85)
  at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:103)
  at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:74)
{noformat}
                
> Create a SpoolDirectory Source and Client
> -----------------------------------------
>
>                 Key: FLUME-1425
>                 URL: https://issues.apache.org/jira/browse/FLUME-1425
>             Project: Flume
>          Issue Type: Improvement
>            Reporter: Patrick Wendell
>            Assignee: Patrick Wendell
>         Attachments: FileProcessingSource.java, FLUME-1425.avro-conf-file.txt, FLUME-1425.patch.v1.txt, FLUME-1425.v5.patch.txt, FLUME-1425.v6.patch.txt, FLUME-1425.v6.patch.txt, FLUME-1425.v7.patch.txt, FLUME-1425.v8.patch.txt
>
>
> The proposal is to create a small executable client which reads logs from a spooling directory and sends them to a flume sink, then performs cleanup on the directory (either by deleting or moving the logs). It would make the following assumptions
> - Files placed in the directory are uniquely named
> - Files placed in the directory are immutable
> The problem this is trying to solve is that there is currently no way to do guaranteed event delivery across flume agent restarts when the data is being collected through an asynchronous source (and not directly from the client API). Say, for instance, you are using a exec("tail -F") source. If the agent restarts due to error or intentionally, tail may pick up at a new location and you lose the intermediate data.
> At the same time, there are users who want at-least-once semantics, and expect those to apply as soon as the data is written to disk from the initial logger process (e.g. apache logs), not just once it has reached a flume agent. This idea would bridge that gap, assuming the user is able to copy immutable logs to a spooling directory through a cron script or something.
> The basic internal logic of such a client would be as follows:
> - Scan the directory for files
> - Chose a file and read through, while sending events to an agent
> - Close the file and delete it (or rename, or otherwise mark completed)
> That's about it. We could add sync-points to make recovery more efficient in the case of failure.
> A key question is whether this should be implemented as a standalone client or as a source. My instinct is actually to do this as a source, but there could be some benefit to not requiring an entire agent in order to run this, specifically that it would become platform independent and you could stick it on Windows machines. Others I have talked to have also sided on a standalone executable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira