You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flume.apache.org by "Viktor Trako (JIRA)" <ji...@apache.org> on 2013/11/21 18:22:38 UTC

[jira] [Comment Edited] (FLUME-2241) Spooling Directory Source doesn't handle files with large-ish event data

    [ https://issues.apache.org/jira/browse/FLUME-2241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13829109#comment-13829109 ] 

Viktor Trako edited comment on FLUME-2241 at 11/21/13 5:22 PM:
---------------------------------------------------------------

I can see the same problem will occur with all characters made up of 2 bytes

||unicode||character||UTF-8(hex)||
|U+0080|	 |	c2 80|
|U+0081|	 |	c2 81|
|U+008|	 	 |	c2 82|
|U+0083|	 |	c2 83|
|U+0084|	 |	c2 84|
|U+0085|	 |	c2 85|
|U+0086|	 |	c2 86|
|U+0087|	 |	c2 87|
|U+0088|	 |	c2 88|
|U+0089|	 |	c2 89|
|U+008A|	 |	c2 8a|
|U+008B|	 |	c2 8b|
|U+008C|	 |	c2 8c|
|U+008D|	 |	c2 8d|
|U+008E|	 |	c2 8e|
|U+008F|	 |	c2 8f|
|U+0090|	 |	c2 90|
|U+0091|	 |	c2 91|
|U+0092|	 |	c2 92|
|U+0093|	 |	c2 93|
|U+0094|	 |	c2 94|
|U+0095|	 |	c2 95|
|U+0096|	 |	c2 96|
|U+0097|	 |	c2 97|
|U+0098|	 |	c2 98|
|U+0099|	 |	c2 99|
|U+009A|	 |	c2 9a|
|U+009B|	 |	c2 9b|
|U+009C|	 |	c2 9c|
|U+009D|	 |	c2 9d|
|U+009E|	 |	c2 9e|
|U+009F|	 |	c2 9f|
|U+00A0|	 |	c2 a0|
|U+00A1|	¡|	c2 a1|
|U+00A2|	¢|	c2 a2|
|U+00A3|	£|	c2 a3|
|U+00A4|	¤|	c2 a4|
|U+00A5|	¥|	c2 a5|
|U+00A6|	¦|	c2 a6|
|U+00A7|	§|	c2 a7|
|U+00A8|	¨|	c2 a8|
|U+00A9|	©|	c2 a9|
|U+00AA|	ª|	c2 aa|
|U+00AB|	«|	c2 ab|
|U+00AC|	¬|	c2 ac|
|U+00AD|	|	c2 ad|
|U+00AE|	®|	c2 ae|
|U+00AF|	¯|	c2 af|
|U+00B0|	°|	c2 b0|
|U+00B1|	±|	c2 b1|
|U+00B2|	²|	c2 b2|
|U+00B3|	³|	c2 b3|
|U+00B4|	´|	c2 b4|
|U+00B5|	µ|	c2 b5|
|U+00B6|	¶|	c2 b6|
|U+00B7|	·|	c2 b7|
|U+00B8|	¸|	c2 b8|
|U+00B9|	¹|	c2 b9|
|U+00BA|	º|	c2 ba|
|U+00BB|	»|	c2 bb|
|U+00BC|	¼|	c2 bc|
|U+00BD|	½|	c2 bd|
|U+00BE|	¾|	c2 be|
|U+00BF|	¿|	c2 bf|
|U+00C0|	À|	c3 80|
|U+00C1|	Á|	c3 81|
|U+00C2|	Â|	c3 82|
|U+00C3|	Ã|	c3 83|
|U+00C4|	Ä|	c3 84|
|U+00C5|	Å|	c3 85|
|U+00C6|	Æ|	c3 86|
|U+00C7|	Ç|	c3 87|
|U+00C8|	È|	c3 88|
|U+00C9|	É|	c3 89|
|U+00CA|	Ê|	c3 8a|
|U+00CB|	Ë|	c3 8b|
|U+00CC|	Ì|	c3 8c|
|U+00CD|	Í|	c3 8d|
|U+00CE|	Î|	c3 8e|
|U+00CF|	Ï|	c3 8f|
|U+00D0|	Ð|	c3 90|
|U+00D1|	Ñ|	c3 91|
|U+00D2|	Ò|	c3 92|
|U+00D3|	Ó|	c3 93|
|U+00D4|	Ô|	c3 94|
|U+00D5|	Õ|	c3 95|
|U+00D6|	Ö|	c3 96|
|U+00D7|	×|	c3 97|
|U+00D8|	Ø|	c3 98|
|U+00D9|	Ù|	c3 99|
|U+00DA|	Ú|	c3 9a|
|U+00DB|	Û|	c3 9b|
|U+00DC|	Ü|	c3 9c|
|U+00DD|	Ý|	c3 9d|
|U+00DE|	Þ|	c3 9e|
|U+00DF|	ß|	c3 9f|
|U+00E0|	à|	c3 a0|
|U+00E1|	á|	c3 a1|
|U+00E2|	â|	c3 a2|
|U+00E3|	ã|	c3 a3|
|U+00E4|	ä|	c3 a4|
|U+00E5|	å|	c3 a5|
|U+00E6|	æ|	c3 a6|
|U+00E7|	ç|	c3 a7|
|U+00E8|	è|	c3 a8|
|U+00E9|	é|	c3 a9|
|U+00EA|	ê|	c3 aa|
|U+00EB|	ë|	c3 ab|
|U+00EC|	ì|	c3 ac|
|U+00ED|	í|	c3 ad|
|U+00EE|	î|	c3 ae|
|U+00EF|	ï|	c3 af|
|U+00F0|	ð|	c3 b0|
|U+00F1|	ñ|	c3 b1|
|U+00F2|	ò|	c3 b2|
|U+00F3|	ó|	c3 b3|
|U+00F4|	ô|	c3 b4|
|U+00F5|	õ|	c3 b5|
|U+00F6|	ö|	c3 b6|
|U+00F7|	÷|	c3 b7|
|U+00F8|	ø|	c3 b8|
|U+00F9|	ù|	c3 b9|
|U+00FA|	ú|	c3 ba|
|U+00FB|	û|	c3 bb|
|U+00FC|	ü|	c3 bc|
|U+00FD|	ý|	c3 bd|
|U+00FE|	þ|	c3 be|
|U+00FF|	ÿ|	c3 bf |


was (Author: viktort):
I can see the same problem will occur with all characters made up of 2 bytes

||unicode||	||character||	||UTF-8(hex)||
|U+0080|	 |	c2 80|
|U+0081|	 |	c2 81|
|U+008|	 	 |	c2 82|
|U+0083|	 |	c2 83|
|U+0084|	 |	c2 84|
|U+0085|	 |	c2 85|
|U+0086|	 |	c2 86|
|U+0087|	 |	c2 87|
|U+0088|	 |	c2 88|
|U+0089|	 |	c2 89|
|U+008A|	 |	c2 8a|
|U+008B|	 |	c2 8b|
|U+008C|	 |	c2 8c|
|U+008D|	 |	c2 8d|
|U+008E|	 |	c2 8e|
|U+008F|	 |	c2 8f|
|U+0090|	 |	c2 90|
|U+0091|	 |	c2 91|
|U+0092|	 |	c2 92|
|U+0093|	 |	c2 93|
|U+0094|	 |	c2 94|
|U+0095|	 |	c2 95|
|U+0096|	 |	c2 96|
|U+0097|	 |	c2 97|
|U+0098|	 |	c2 98|
|U+0099|	 |	c2 99|
|U+009A|	 |	c2 9a|
|U+009B|	 |	c2 9b|
|U+009C|	 |	c2 9c|
|U+009D|	 |	c2 9d|
|U+009E|	 |	c2 9e|
|U+009F|	 |	c2 9f|
|U+00A0|	 |	c2 a0|
|U+00A1|	¡|	c2 a1|
|U+00A2|	¢|	c2 a2|
|U+00A3|	£|	c2 a3|
|U+00A4|	¤|	c2 a4|
|U+00A5|	¥|	c2 a5|
|U+00A6|	¦|	c2 a6|
|U+00A7|	§|	c2 a7|
|U+00A8|	¨|	c2 a8|
|U+00A9|	©|	c2 a9|
|U+00AA|	ª|	c2 aa|
|U+00AB|	«|	c2 ab|
|U+00AC|	¬|	c2 ac|
|U+00AD|	|	c2 ad|
|U+00AE|	®|	c2 ae|
|U+00AF|	¯|	c2 af|
|U+00B0|	°|	c2 b0|
|U+00B1|	±|	c2 b1|
|U+00B2|	²|	c2 b2|
|U+00B3|	³|	c2 b3|
|U+00B4|	´|	c2 b4|
|U+00B5|	µ|	c2 b5|
|U+00B6|	¶|	c2 b6|
|U+00B7|	·|	c2 b7|
|U+00B8|	¸|	c2 b8|
|U+00B9|	¹|	c2 b9|
|U+00BA|	º|	c2 ba|
|U+00BB|	»|	c2 bb|
|U+00BC|	¼|	c2 bc|
|U+00BD|	½|	c2 bd|
|U+00BE|	¾|	c2 be|
|U+00BF|	¿|	c2 bf|
|U+00C0|	À|	c3 80|
|U+00C1|	Á|	c3 81|
|U+00C2|	Â|	c3 82|
|U+00C3|	Ã|	c3 83|
|U+00C4|	Ä|	c3 84|
|U+00C5|	Å|	c3 85|
|U+00C6|	Æ|	c3 86|
|U+00C7|	Ç|	c3 87|
|U+00C8|	È|	c3 88|
|U+00C9|	É|	c3 89|
|U+00CA|	Ê|	c3 8a|
|U+00CB|	Ë|	c3 8b|
|U+00CC|	Ì|	c3 8c|
|U+00CD|	Í|	c3 8d|
|U+00CE|	Î|	c3 8e|
|U+00CF|	Ï|	c3 8f|
|U+00D0|	Ð|	c3 90|
|U+00D1|	Ñ|	c3 91|
|U+00D2|	Ò|	c3 92|
|U+00D3|	Ó|	c3 93|
|U+00D4|	Ô|	c3 94|
|U+00D5|	Õ|	c3 95|
|U+00D6|	Ö|	c3 96|
|U+00D7|	×|	c3 97|
|U+00D8|	Ø|	c3 98|
|U+00D9|	Ù|	c3 99|
|U+00DA|	Ú|	c3 9a|
|U+00DB|	Û|	c3 9b|
|U+00DC|	Ü|	c3 9c|
|U+00DD|	Ý|	c3 9d|
|U+00DE|	Þ|	c3 9e|
|U+00DF|	ß|	c3 9f|
|U+00E0|	à|	c3 a0|
|U+00E1|	á|	c3 a1|
|U+00E2|	â|	c3 a2|
|U+00E3|	ã|	c3 a3|
|U+00E4|	ä|	c3 a4|
|U+00E5|	å|	c3 a5|
|U+00E6|	æ|	c3 a6|
|U+00E7|	ç|	c3 a7|
|U+00E8|	è|	c3 a8|
|U+00E9|	é|	c3 a9|
|U+00EA|	ê|	c3 aa|
|U+00EB|	ë|	c3 ab|
|U+00EC|	ì|	c3 ac|
|U+00ED|	í|	c3 ad|
|U+00EE|	î|	c3 ae|
|U+00EF|	ï|	c3 af|
|U+00F0|	ð|	c3 b0|
|U+00F1|	ñ|	c3 b1|
|U+00F2|	ò|	c3 b2|
|U+00F3|	ó|	c3 b3|
|U+00F4|	ô|	c3 b4|
|U+00F5|	õ|	c3 b5|
|U+00F6|	ö|	c3 b6|
|U+00F7|	÷|	c3 b7|
|U+00F8|	ø|	c3 b8|
|U+00F9|	ù|	c3 b9|
|U+00FA|	ú|	c3 ba|
|U+00FB|	û|	c3 bb|
|U+00FC|	ü|	c3 bc|
|U+00FD|	ý|	c3 bd|
|U+00FE|	þ|	c3 be|
|U+00FF|	ÿ|	c3 bf |

> Spooling Directory Source doesn't handle files with large-ish event data
> ------------------------------------------------------------------------
>
>                 Key: FLUME-2241
>                 URL: https://issues.apache.org/jira/browse/FLUME-2241
>             Project: Flume
>          Issue Type: Bug
>    Affects Versions: v1.4.0
>         Environment: Debian 6.0.5
>            Reporter: Viktor Trako
>
> I have a flume agent set up with a spooling directory source sinking data to cassandra.
> I'm collecting web data writing a line in the log file for each request then once the log file has been rotated is dropped into the spooling directory ready for flume to start processing it. All data is valid json as its validated prior to it being written to the log file.
> Sending a mixture of different sized requests from 9-15k seems fine. Generated a log file of over 400Mb and it all sinked correctly.
> I'm currently logging a 19k request and this is when things start to break. It only gets as far as 1800th request in the file and the next one is truncated.
> Changed the sink to a file-roll sink and it only gets as far as 29Mb
> I have profiled it and it's not running out of memory. I want to know if there are any limitations on the spooling directory source.
> Has anyone tried dropping a file with similarly large requests and experienced a similar issue.
> Any pointers would be greatly appreciated. My flume config is as follows
> {code:title=flume_conf|borderStyle=solid}
> orion.sources = spoolDir
> orion.channels = fileChannel
> orion.sinks= cassandra
> orion.channels.fileChannel.type = file
> orion.channels.fileChannel.capacity = 1000000
> orion.channels.fileChannel.transactionCapacity = 100
> orion.channels.fileChannel.keep-alive = 60
> orion.channels.fileChannel.write-timeout = 60
> orion.sinks.cassandra.type = com.btoddb.flume.sinks.cassandra.CassandraSink
> orion.sinks.cassandra.hosts = <cluster node ip>
> orion.sinks.cassandra.cluster_name = fake_cluster
> orion.sinks.cassandra.port = 9160
> orion.sinks.cassandra.keyspace-name = Keysp
> orion.sinks.cassandra.records-colfam = <table>
> orion.sources.spoolDir.type = spooldir
> orion.sources.spoolDir.spoolDir = /var/log/orion/flumeSpooling
> orion.sources.spoolDir.deserializer = LINE
> orion.sources.spoolDir.inputCharset = UTF-8
> orion.sources.spoolDir.deserializer.maxLineLength = 20000000
> orion.sources.spoolDir.deletePolicy = never
> orion.sources.spoolDir.batchSize = 100
> orion.sources.spoolDir.interceptors = addSrc addHost addTimestamp addUUID
> orion.sources.spoolDir.interceptors.addSrc.type = regex_extractor
> orion.sources.spoolDir.interceptors.addSrc.regex = \"service\"\:\"([^"]*)
> orion.sources.spoolDir.interceptors.addSrc.serializers = s1
> orion.sources.spoolDir.interceptors.addSrc.serializers.s1.name = src
> orion.sources.spoolDir.interceptors.addUUID.type = regex_extractor
> orion.sources.spoolDir.interceptors.addUUID.regex = \"uuid\"\:\"([^"]*)
> orion.sources.spoolDir.interceptors.addUUID.serializers = s1
> orion.sources.spoolDir.interceptors.addUUID.serializers.s1.name = key
> orion.sources.spoolDir.interceptors.addHost.type = org.apache.flume.interceptor.HostInterceptor$Builder
> orion.sources.spoolDir.interceptors.addHost.preserveExisting = false
> orion.sources.spoolDir.interceptors.addHost.useIP = true
> orion.sources.spoolDir.interceptors.addHost.hostHeader = host
> orion.sources.spoolDir.interceptors.addTimestamp.type = regex_extractor
> orion.sources.spoolDir.interceptors.addTimestamp.regex = \"timestamp\"\:\"([^"]*)
> orion.sources.spoolDir.interceptors.addTimestamp.serializers = s1
> orion.sources.spoolDir.interceptors.addTimestamp.serializers.s1.name = timestamp
> orion.sources.spoolDir.channels = fileChannel
> orion.sinks.cassandra.channel = fileChannel
> {code}
> Is this potentially a bug?.. If not tried can someone try to recreate - I hope the same error would occur.
> Dont hesitate to contact me for further info.
> Viktor



--
This message was sent by Atlassian JIRA
(v6.1#6144)