You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flume.apache.org by ZORAIDA HIDALGO SANCHEZ <zo...@tid.es> on 2013/05/21 15:20:31 UTC

Spooling fileSuffix attribute ignored

Dear all,
I am using the Spooling attribute "fileSuffix" to ensure that only when a file has been completely upload into the Spooling dir,  is consumed by the source. However, files with no suffix are also appended to the channel and then processed by the sink.
My configuration:

tier1.sources  = s1
tier1.channels = c1
tier1.sinks    = s1

# For each source, channel, and sink, set
# standard properties.
tier1.sources.s1.type     = spooldir
tier1.sources.s1.spoolDir = /home/user/flume/data
tier1.sources.s1.deletePolicy = immediate
tier1.sources.s1.batchSize = 1000
tier1.sources.s1.bufferMaxLines = 3000
tier1.sources.s1.fileHeader = true
tier1.sources.s1.fileSuffix=.COMPLETED

Is that ok?

Thanks.

________________________________

Este mensaje se dirige exclusivamente a su destinatario. Puede consultar nuestra pol?tica de env?o y recepci?n de correo electr?nico en el enlace situado m?s abajo.
This message is intended exclusively for its addressee. We only send and receive email on the basis of the terms set out at:
http://www.tid.es/ES/PAGINAS/disclaimer.aspx

RE: Spooling fileSuffix attribute ignored

Posted by Phil Scala <Ph...@globalrelay.net>.
Ah Mike, nice to see the ignore pattern ☺  that’s quite handy.

As for your question to me:
I am still using the 1.3.x version at the moment.  As for comments on the spooled directory source, It still early in development but so far it has worked fairly well, though I have not thrown too much at it, my biggest test was dropping 300 files to be parsed, and it chugged through very nicely)

I am looking at maybe taking a crack at – https://issues.apache.org/jira/browse/FLUME-1899 sub directory support, we recently had some discussions and some of the design may change and warrant implementing that.


One observation I have had from some failure tests, if I kill -9 my agent process mid file processing, when my agent is restarted the spool source will basically start at the top of that same file.  So there are chances for duplicates, and in my use this is good…but It may not be an expectation to others.

I will definitely report back with findings as I move along, this is a fast moving project so I will have some better “numbers”  very soon.

Thanks
  Phil


Phil Scala
Software Developer / Architect
Global Relay

phil.scala@globalrelay.net<ma...@globalrelay.net>

866.484.6630  |  info@globalrelay.net<https://ex1.globalrelay.net/owa/redir.aspx?C=01df0579a4cc42feb9f7e2801368f729&URL=mailto%3ainfo%40globalrelay.net>  |  globalrelay.com<https://ex1.globalrelay.net/owa/redir.aspx?C=01df0579a4cc42feb9f7e2801368f729&URL=http%3a%2f%2fwww.globalrelay.com%2f>

From: Mike Percy [mailto:mpercy@apache.org]
Sent: Wednesday, May 22, 2013 3:59 AM
To: user@flume.apache.org
Subject: Re: Spooling fileSuffix attribute ignored

You should check whether your version of Flume supports the ignorePattern configuration param. The latest version on trunk does.

Mike


On Wed, May 22, 2013 at 12:43 AM, ZORAIDA HIDALGO SANCHEZ <zo...@tid.es>> wrote:
Oh! I see, then was a miss understanding. Ok, so we will need to find a workaround. Thanks a lot.

De: Mike Percy <mp...@apache.org>>
Responder a: Flume User List <us...@flume.apache.org>>
Fecha: miércoles, 22 de mayo de 2013 09:35
Para: Flume User List <us...@flume.apache.org>>
Asunto: Re: Spooling fileSuffix attribute ignored

Hi Phil,
Nice approach. How is the spooling directory source working for you? Any thoughts on how it could be improved?

Mike

On Tue, May 21, 2013 at 8:17 AM, Phil Scala <Ph...@globalrelay.net>> wrote:
Hi,

Based on my use and understanding that setting “fileSuffix” is simpy the extension to the file to be added once the file was consumed and placed onto the channel.  I don’t think it was intended to be used to indicate a completely uploaded file.    In the dev newsgroup there was a discussion about having the spooler ”wait” for a little while before ingesting the file.  An ignore pattern may also be a good idea.

Currently I am using an upload directory that I monitor and when lsof reports a file is not in use/open I then move it to my spool directory.

HTH
Phil



Phil Scala
Software Developer / Architect
Global Relay

phil.scala@globalrelay.net<ma...@globalrelay.net>

866.484.6630<tel:866.484.6630>  |  info@globalrelay.net<https://ex1.globalrelay.net/owa/redir.aspx?C=01df0579a4cc42feb9f7e2801368f729&URL=mailto%3ainfo%40globalrelay.net>  |  globalrelay.com<https://ex1.globalrelay.net/owa/redir.aspx?C=01df0579a4cc42feb9f7e2801368f729&URL=http%3a%2f%2fwww.globalrelay.com%2f>

From: ZORAIDA HIDALGO SANCHEZ [mailto:zoraida@tid.es<ma...@tid.es>]
Sent: Tuesday, May 21, 2013 9:21 AM
To: Flume User List
Subject: Spooling fileSuffix attribute ignored

Dear all,
I am using the Spooling attribute "fileSuffix" to ensure that only when a file has been completely upload into the Spooling dir,  is consumed by the source. However, files with no suffix are also appended to the channel and then processed by the sink.
My configuration:

tier1.sources  = s1
tier1.channels = c1
tier1.sinks    = s1

# For each source, channel, and sink, set
# standard properties.
tier1.sources.s1.type     = spooldir
tier1.sources.s1.spoolDir = /home/user/flume/data
tier1.sources.s1.deletePolicy = immediate
tier1.sources.s1.batchSize = 1000
tier1.sources.s1.bufferMaxLines = 3000
tier1.sources.s1.fileHeader = true
tier1.sources.s1.fileSuffix=.COMPLETED

Is that ok?

Thanks.

________________________________

Este mensaje se dirige exclusivamente a su destinatario. Puede consultar nuestra política de envío y recepción de correo electrónico en el enlace situado más abajo.
This message is intended exclusively for its addressee. We only send and receive email on the basis of the terms set out at:
http://www.tid.es/ES/PAGINAS/disclaimer.aspx


________________________________

Este mensaje se dirige exclusivamente a su destinatario. Puede consultar nuestra política de envío y recepción de correo electrónico en el enlace situado más abajo.
This message is intended exclusively for its addressee. We only send and receive email on the basis of the terms set out at:
http://www.tid.es/ES/PAGINAS/disclaimer.aspx


Re: Spooling fileSuffix attribute ignored

Posted by Mike Percy <mp...@apache.org>.
You should check whether your version of Flume supports the ignorePattern
configuration param. The latest version on trunk does.

Mike



On Wed, May 22, 2013 at 12:43 AM, ZORAIDA HIDALGO SANCHEZ <zo...@tid.es>wrote:

>  Oh! I see, then was a miss understanding. Ok, so we will need to find a
> workaround. Thanks a lot.
>
>   De: Mike Percy <mp...@apache.org>
> Responder a: Flume User List <us...@flume.apache.org>
> Fecha: miércoles, 22 de mayo de 2013 09:35
> Para: Flume User List <us...@flume.apache.org>
> Asunto: Re: Spooling fileSuffix attribute ignored
>
>   Hi Phil,
> Nice approach. How is the spooling directory source working for you? Any
> thoughts on how it could be improved?
>
>  Mike
>
>
> On Tue, May 21, 2013 at 8:17 AM, Phil Scala <Ph...@globalrelay.net>wrote:
>
>>  Hi,****
>>
>> ** **
>>
>> Based on my use and understanding that setting “fileSuffix” is simpy the
>> extension to the file to be added once the file was consumed and placed
>> onto the channel.  I don’t think it was intended to be used to indicate a
>> completely uploaded file.    In the dev newsgroup there was a discussion
>> about having the spooler ”wait” for a little while before ingesting the
>> file.  An ignore pattern may also be a good idea.****
>>
>> ** **
>>
>> Currently I am using an upload directory that I monitor and when lsof
>> reports a file is not in use/open I then move it to my spool directory.
>> ****
>>
>> ** **
>>
>> HTH****
>>
>> Phil****
>>
>> ** **
>>
>> ** **
>>
>> ** **
>>
>> Phil Scala****
>>
>> Software Developer / Architect
>> Global Relay
>>
>> phil.scala@globalrelay.net
>>
>> *866.484.6630*  |  info@globalrelay.net<https://ex1.globalrelay.net/owa/redir.aspx?C=01df0579a4cc42feb9f7e2801368f729&URL=mailto%3ainfo%40globalrelay.net>
>> |  globalrelay.com<https://ex1.globalrelay.net/owa/redir.aspx?C=01df0579a4cc42feb9f7e2801368f729&URL=http%3a%2f%2fwww.globalrelay.com%2f>
>>  ****
>>
>> ** **
>>
>> *From:* ZORAIDA HIDALGO SANCHEZ [mailto:zoraida@tid.es]
>> *Sent:* Tuesday, May 21, 2013 9:21 AM
>> *To:* Flume User List
>> *Subject:* Spooling fileSuffix attribute ignored****
>>
>> ** **
>>
>> Dear all, ****
>>
>> I am using the Spooling attribute "fileSuffix" to ensure that only when a
>> file has been completely upload into the Spooling dir,  is consumed by the
>> source. However, files with no suffix are also appended to the channel and
>> then processed by the sink. ****
>>
>> My configuration:****
>>
>> ** **
>>
>> tier1.sources  = s1****
>>
>> tier1.channels = c1****
>>
>> tier1.sinks    = s1****
>>
>> ** **
>>
>> # For each source, channel, and sink, set****
>>
>> # standard properties.****
>>
>> tier1.sources.s1.type     = spooldir****
>>
>> tier1.sources.s1.spoolDir = /home/user/flume/data****
>>
>> tier1.sources.s1.deletePolicy = immediate****
>>
>> tier1.sources.s1.batchSize = 1000****
>>
>> tier1.sources.s1.bufferMaxLines = 3000****
>>
>> tier1.sources.s1.fileHeader = true****
>>
>> tier1.sources.s1.fileSuffix=.COMPLETED****
>>
>> ** **
>>
>> Is that ok? ****
>>
>> ** **
>>
>> Thanks.****
>>
>> ** **
>>  ------------------------------
>>
>>
>> Este mensaje se dirige exclusivamente a su destinatario. Puede consultar
>> nuestra política de envío y recepción de correo electrónico en el enlace
>> situado más abajo.
>> This message is intended exclusively for its addressee. We only send and
>> receive email on the basis of the terms set out at:
>> http://www.tid.es/ES/PAGINAS/disclaimer.aspx****
>>
>
>
> ------------------------------
>
> Este mensaje se dirige exclusivamente a su destinatario. Puede consultar
> nuestra política de envío y recepción de correo electrónico en el enlace
> situado más abajo.
> This message is intended exclusively for its addressee. We only send and
> receive email on the basis of the terms set out at:
> http://www.tid.es/ES/PAGINAS/disclaimer.aspx
>

Re: Spooling fileSuffix attribute ignored

Posted by Mike Percy <mp...@apache.org>.
Hi Phil,
Nice approach. How is the spooling directory source working for you? Any
thoughts on how it could be improved?

Mike


On Tue, May 21, 2013 at 8:17 AM, Phil Scala <Ph...@globalrelay.net>wrote:

> Hi,****
>
> ** **
>
> Based on my use and understanding that setting “fileSuffix” is simpy the
> extension to the file to be added once the file was consumed and placed
> onto the channel.  I don’t think it was intended to be used to indicate a
> completely uploaded file.    In the dev newsgroup there was a discussion
> about having the spooler ”wait” for a little while before ingesting the
> file.  An ignore pattern may also be a good idea.****
>
> ** **
>
> Currently I am using an upload directory that I monitor and when lsof
> reports a file is not in use/open I then move it to my spool directory.  *
> ***
>
> ** **
>
> HTH****
>
> Phil****
>
> ** **
>
> ** **
>
> ** **
>
> Phil Scala****
>
> Software Developer / Architect
> Global Relay
>
> phil.scala@globalrelay.net
>
> *866.484.6630*  |  info@globalrelay.net<https://ex1.globalrelay.net/owa/redir.aspx?C=01df0579a4cc42feb9f7e2801368f729&URL=mailto%3ainfo%40globalrelay.net>
> |  globalrelay.com<https://ex1.globalrelay.net/owa/redir.aspx?C=01df0579a4cc42feb9f7e2801368f729&URL=http%3a%2f%2fwww.globalrelay.com%2f>
>  ****
>
> ** **
>
> *From:* ZORAIDA HIDALGO SANCHEZ [mailto:zoraida@tid.es]
> *Sent:* Tuesday, May 21, 2013 9:21 AM
> *To:* Flume User List
> *Subject:* Spooling fileSuffix attribute ignored****
>
> ** **
>
> Dear all, ****
>
> I am using the Spooling attribute "fileSuffix" to ensure that only when a
> file has been completely upload into the Spooling dir,  is consumed by the
> source. However, files with no suffix are also appended to the channel and
> then processed by the sink. ****
>
> My configuration:****
>
> ** **
>
> tier1.sources  = s1****
>
> tier1.channels = c1****
>
> tier1.sinks    = s1****
>
> ** **
>
> # For each source, channel, and sink, set****
>
> # standard properties.****
>
> tier1.sources.s1.type     = spooldir****
>
> tier1.sources.s1.spoolDir = /home/user/flume/data****
>
> tier1.sources.s1.deletePolicy = immediate****
>
> tier1.sources.s1.batchSize = 1000****
>
> tier1.sources.s1.bufferMaxLines = 3000****
>
> tier1.sources.s1.fileHeader = true****
>
> tier1.sources.s1.fileSuffix=.COMPLETED****
>
> ** **
>
> Is that ok? ****
>
> ** **
>
> Thanks.****
>
> ** **
> ------------------------------
>
>
> Este mensaje se dirige exclusivamente a su destinatario. Puede consultar
> nuestra política de envío y recepción de correo electrónico en el enlace
> situado más abajo.
> This message is intended exclusively for its addressee. We only send and
> receive email on the basis of the terms set out at:
> http://www.tid.es/ES/PAGINAS/disclaimer.aspx****
>

RE: Spooling fileSuffix attribute ignored

Posted by Phil Scala <Ph...@globalrelay.net>.
Hi,

Based on my use and understanding that setting "fileSuffix" is simpy the extension to the file to be added once the file was consumed and placed onto the channel.  I don't think it was intended to be used to indicate a completely uploaded file.    In the dev newsgroup there was a discussion about having the spooler "wait" for a little while before ingesting the file.  An ignore pattern may also be a good idea.

Currently I am using an upload directory that I monitor and when lsof reports a file is not in use/open I then move it to my spool directory.

HTH
Phil



Phil Scala
Software Developer / Architect
Global Relay

phil.scala@globalrelay.net<ma...@globalrelay.net>

866.484.6630  |  info@globalrelay.net<https://ex1.globalrelay.net/owa/redir.aspx?C=01df0579a4cc42feb9f7e2801368f729&URL=mailto%3ainfo%40globalrelay.net>  |  globalrelay.com<https://ex1.globalrelay.net/owa/redir.aspx?C=01df0579a4cc42feb9f7e2801368f729&URL=http%3a%2f%2fwww.globalrelay.com%2f>

From: ZORAIDA HIDALGO SANCHEZ [mailto:zoraida@tid.es]
Sent: Tuesday, May 21, 2013 9:21 AM
To: Flume User List
Subject: Spooling fileSuffix attribute ignored

Dear all,
I am using the Spooling attribute "fileSuffix" to ensure that only when a file has been completely upload into the Spooling dir,  is consumed by the source. However, files with no suffix are also appended to the channel and then processed by the sink.
My configuration:

tier1.sources  = s1
tier1.channels = c1
tier1.sinks    = s1

# For each source, channel, and sink, set
# standard properties.
tier1.sources.s1.type     = spooldir
tier1.sources.s1.spoolDir = /home/user/flume/data
tier1.sources.s1.deletePolicy = immediate
tier1.sources.s1.batchSize = 1000
tier1.sources.s1.bufferMaxLines = 3000
tier1.sources.s1.fileHeader = true
tier1.sources.s1.fileSuffix=.COMPLETED

Is that ok?

Thanks.

________________________________

Este mensaje se dirige exclusivamente a su destinatario. Puede consultar nuestra política de envío y recepción de correo electrónico en el enlace situado más abajo.
This message is intended exclusively for its addressee. We only send and receive email on the basis of the terms set out at:
http://www.tid.es/ES/PAGINAS/disclaimer.aspx