You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flume.apache.org by 周梦想 <ab...@gmail.com> on 2013/02/21 04:37:22 UTC

how to implement a tail or tailDir of flume-ng on windows?

hello,

there isn't tail or tailDir source of flume-ng.
exec source can run tail command on linux.
but there is not a tail command on windows. So I have to write some code to
do the same work.
I want to read a file and if there is new lines of a  file, it sends the
lines to flume-ng.

some one give me some advice?

Thanks,
Andy

Re: how to implement a tail or tailDir of flume-ng on windows?

Posted by 周梦想 <ab...@gmail.com>.
Hi Dan,
the spooling directory have to change file name, not suite to sending event
to flume while writing logs to the log file at the same time. Isn't it ?

thanks,
andy

2013/2/21 dan young <da...@gmail.com>

> Have you checked out the spooling directory? That's what we moved to, from
> tail....
>
> Regards,
>
> Dan
> On Feb 20, 2013 8:38 PM, "周梦想" <ab...@gmail.com> wrote:
>
>> hello,
>>
>> there isn't tail or tailDir source of flume-ng.
>> exec source can run tail command on linux.
>> but there is not a tail command on windows. So I have to write some code
>> to do the same work.
>> I want to read a file and if there is new lines of a  file, it sends the
>> lines to flume-ng.
>>
>> some one give me some advice?
>>
>> Thanks,
>> Andy
>>
>>

Re: how to implement a tail or tailDir of flume-ng on windows?

Posted by dan young <da...@gmail.com>.
Have you checked out the spooling directory? That's what we moved to, from
tail....

Regards,

Dan
On Feb 20, 2013 8:38 PM, "周梦想" <ab...@gmail.com> wrote:

> hello,
>
> there isn't tail or tailDir source of flume-ng.
> exec source can run tail command on linux.
> but there is not a tail command on windows. So I have to write some code
> to do the same work.
> I want to read a file and if there is new lines of a  file, it sends the
> lines to flume-ng.
>
> some one give me some advice?
>
> Thanks,
> Andy
>
>

Re: how to implement a tail or tailDir of flume-ng on windows?

Posted by 周梦想 <ab...@gmail.com>.
Sorry, it seems OK after i change config file to:
agent1.sources.userlogsrc.command = C:\\Python27\\python.exe
D:\\apache-flume-1.3.1-bin\\tail.py E:\\mydoc\\gamelog\\game.log

I removed the " " of the command,and create python process ok.
So the bat file is also can run as the command.

problem that not generate file to hdfs before, maybe it is because of the
data.log is too small? it's only has a few lines,while game.log is about
400MB.
And i set agent1.channels.memch1.capacity = 10000 ?

I'll test more of this.
Thanks all.
Andy

2013/2/21 周梦想 <ab...@gmail.com>

> Hi Juhani,
> I wrote a python script tail.py as below:
> import time, os
> import sys
> #Set the filename and open the file
> #filename = 'security_log'
>
> def tail_f(file):
>   interval = 1.0
>
>   while True:
>     where = file.tell()
>     line = file.readline()
>     if not line:
>       time.sleep(interval)
>       file.seek(where)
>     else:
>       yield line
> for line in tail_f(open(sys.argv[1])):
>   print line,
>
> tail.bat:
> C:\Python27\python.exe D:\apache-flume-1.3.1-bin\tail.py d:\data.log
>
> I changed conf file to :
> agent1.sources.userlogsrc.type = exec
> agent1.sources.userlogsrc.command =
> "D:\\apache-flume-1.3.1-bin\\bin\\tail.bat"
>
> this node tail the file, sink is avro, send to another node source is avro.
> I run my flume.bat, it gives nothing error, I can see the connection is
> ok, but does not send any data to flume-ng.
>
> if i change config file to :
> agent1.sources.userlogsrc.command = "C:\\Python27\\python.exe
> D:\\apache-flume-1.3.1-bin\\tail.py d:\\data.log"
>
> run the flume.bat,it report error:
> 2013-02-21 15:21:08,622 (pool-4-thread-1) [ERROR -
> org.apache.flume.source.ExecS
> ource$ExecRunnable.run(ExecSource.java:284)] Failed while running command:
> "C:\P
> ython27\python.exe D:\apache-flume-1.3.1-bin\tail.py d:\data.log"
> java.io.IOException: Cannot run program ""C:\Python27\python.exe":
> CreateProcess
>  error=2, ?????????
>         at java.lang.ProcessBuilder.start(ProcessBuilder.java:460)
>         at
> org.apache.flume.source.ExecSource$ExecRunnable.run(ExecSource.java:2
> 59)
>         at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:44
> 1)
>         at
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExec
> utor.java:886)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
> .java:908)
>         at java.lang.Thread.run(Thread.java:662)
> Caused by: java.io.IOException: CreateProcess error=2, ?????????
>         at java.lang.ProcessImpl.create(Native Method)
>         at java.lang.ProcessImpl.<init>(ProcessImpl.java:81)
>         at java.lang.ProcessImpl.start(ProcessImpl.java:30)
>         at java.lang.ProcessBuilder.start(ProcessBuilder.java:453)
>         ... 7 more
> 2013-02-21 15:21:08,651 (pool-4-thread-1) [INFO -
> org.apache.flume.source.ExecSo
> urce$ExecRunnable.run(ExecSource.java:307)] Command
> ["C:\Python27\python.exe D:\
> apache-flume-1.3.1-bin\tail.py d:\data.log"] exited with -1073741824
>
> I don't know why the exec source can't run python program?
>
> Thanks,
> Andy
>
> 2013/2/21 Juhani Connolly <ju...@cyberagent.co.jp>
>
>> You'd want to just periodically stat the file to be tailed, checking for
>> change in last modified/size, and read the difference out of it. You could
>> always download the source for tail itself and see how it does it:
>> http://git.savannah.gnu.org/**cgit/coreutils.git/tree/src/**tail.c<http://git.savannah.gnu.org/cgit/coreutils.git/tree/src/tail.c>
>>
>> If you're going to write this to feed data to flume you're better off
>> having it send data over thrift to flume so you can resend it on failures.
>>
>>
>> On 02/21/2013 12:37 PM, 周梦想 wrote:
>>
>>> hello,
>>>
>>> there isn't tail or tailDir source of flume-ng.
>>> exec source can run tail command on linux.
>>> but there is not a tail command on windows. So I have to write some code
>>> to do the same work.
>>> I want to read a file and if there is new lines of a file, it sends the
>>> lines to flume-ng.
>>>
>>> some one give me some advice?
>>>
>>> Thanks,
>>> Andy
>>>
>>>
>>
>

Re: how to implement a tail or tailDir of flume-ng on windows?

Posted by 周梦想 <ab...@gmail.com>.
Hi Juhani,
I wrote a python script tail.py as below:
import time, os
import sys
#Set the filename and open the file
#filename = 'security_log'

def tail_f(file):
  interval = 1.0

  while True:
    where = file.tell()
    line = file.readline()
    if not line:
      time.sleep(interval)
      file.seek(where)
    else:
      yield line
for line in tail_f(open(sys.argv[1])):
  print line,

tail.bat:
C:\Python27\python.exe D:\apache-flume-1.3.1-bin\tail.py d:\data.log

I changed conf file to :
agent1.sources.userlogsrc.type = exec
agent1.sources.userlogsrc.command =
"D:\\apache-flume-1.3.1-bin\\bin\\tail.bat"

this node tail the file, sink is avro, send to another node source is avro.
I run my flume.bat, it gives nothing error, I can see the connection is ok,
but does not send any data to flume-ng.

if i change config file to :
agent1.sources.userlogsrc.command = "C:\\Python27\\python.exe
D:\\apache-flume-1.3.1-bin\\tail.py d:\\data.log"

run the flume.bat,it report error:
2013-02-21 15:21:08,622 (pool-4-thread-1) [ERROR -
org.apache.flume.source.ExecS
ource$ExecRunnable.run(ExecSource.java:284)] Failed while running command:
"C:\P
ython27\python.exe D:\apache-flume-1.3.1-bin\tail.py d:\data.log"
java.io.IOException: Cannot run program ""C:\Python27\python.exe":
CreateProcess
 error=2, ?????????
        at java.lang.ProcessBuilder.start(ProcessBuilder.java:460)
        at
org.apache.flume.source.ExecSource$ExecRunnable.run(ExecSource.java:2
59)
        at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:44
1)
        at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExec
utor.java:886)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
.java:908)
        at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.IOException: CreateProcess error=2, ?????????
        at java.lang.ProcessImpl.create(Native Method)
        at java.lang.ProcessImpl.<init>(ProcessImpl.java:81)
        at java.lang.ProcessImpl.start(ProcessImpl.java:30)
        at java.lang.ProcessBuilder.start(ProcessBuilder.java:453)
        ... 7 more
2013-02-21 15:21:08,651 (pool-4-thread-1) [INFO -
org.apache.flume.source.ExecSo
urce$ExecRunnable.run(ExecSource.java:307)] Command
["C:\Python27\python.exe D:\
apache-flume-1.3.1-bin\tail.py d:\data.log"] exited with -1073741824

I don't know why the exec source can't run python program?

Thanks,
Andy
2013/2/21 Juhani Connolly <ju...@cyberagent.co.jp>

> You'd want to just periodically stat the file to be tailed, checking for
> change in last modified/size, and read the difference out of it. You could
> always download the source for tail itself and see how it does it:
> http://git.savannah.gnu.org/**cgit/coreutils.git/tree/src/**tail.c<http://git.savannah.gnu.org/cgit/coreutils.git/tree/src/tail.c>
>
> If you're going to write this to feed data to flume you're better off
> having it send data over thrift to flume so you can resend it on failures.
>
>
> On 02/21/2013 12:37 PM, 周梦想 wrote:
>
>> hello,
>>
>> there isn't tail or tailDir source of flume-ng.
>> exec source can run tail command on linux.
>> but there is not a tail command on windows. So I have to write some code
>> to do the same work.
>> I want to read a file and if there is new lines of a file, it sends the
>> lines to flume-ng.
>>
>> some one give me some advice?
>>
>> Thanks,
>> Andy
>>
>>
>

Re: how to implement a tail or tailDir of flume-ng on windows?

Posted by Juhani Connolly <ju...@cyberagent.co.jp>.
You'd want to just periodically stat the file to be tailed, checking for 
change in last modified/size, and read the difference out of it. You 
could always download the source for tail itself and see how it does it: 
http://git.savannah.gnu.org/cgit/coreutils.git/tree/src/tail.c

If you're going to write this to feed data to flume you're better off 
having it send data over thrift to flume so you can resend it on failures.

On 02/21/2013 12:37 PM, 周梦想 wrote:
> hello,
>
> there isn't tail or tailDir source of flume-ng.
> exec source can run tail command on linux.
> but there is not a tail command on windows. So I have to write some 
> code to do the same work.
> I want to read a file and if there is new lines of a file, it sends 
> the lines to flume-ng.
>
> some one give me some advice?
>
> Thanks,
> Andy
>