You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flume.apache.org by Mungeol Heo <mu...@gmail.com> on 2015/01/30 10:34:52 UTC

Flume gives "java.lang.IllegalArgumentException" when using regex_extractor for extracting timestamp from apache access log

case 1:

the setting I used is listed below.

----------
agent01.sources.source01.interceptors.interceptor02.type = regex_extractor
agent01.sources.source01.interceptors.interceptor02.regex =
^\\d+\\.\\d+.\\d+.\\d+\\s\\S+\\s\\S+\\s\\[(\\d+\\/[a-zA-z]{3}\\/\\d{4}:\\d{2}:\\d{2}:\\d{2})\\s\\+0900\\]\\s
agent01.sources.source01.interceptors.interceptor02.serializers = s01
agent01.sources.source01.interceptors.interceptor02.serializers.s01.type
= org.apache.flume.interceptor.RegexExtractorInterceptorMillisSerializer
agent01.sources.source01.interceptors.interceptor02.serializers.s01.pattern
= dd/MMM/yyyy:HH:mm:ss
agent01.sources.source01.interceptors.interceptor02.serializers.s01.name
= timestamp
----------

It gives me an 'java.lang.IllegalArgumentException: Invalid format:
"30/Jan/2015:15:01:03" is malformed at "Jan/2015:15:01:03"' error.



case 2:

the setting I used is listed below.

----------
regex = ^\\d+\\.\\d+.\\d+.\\d+\\s\\S+\\s\\S+\\s\\[\\d+\\/([a-zA-z]{3})\\/\\d{4}:\\d{2}:\\d{2}:\\d{2}\\s\\+0900\\]\\s
pattern = MMM
----------

it gives me an 'java.lang.IllegalArgumentException: Invalid format:
"Jan"' error.



case 3:

the setting I used are listed below.

----------
regex  = ^\\d+\\.\\d+.\\d+.\\d+\\s\\S+\\s\\S+\\s\\[\\d+\\/[a-zA-z]{3}(\\/\\d{4}:\\d{2}:\\d{2}:\\d{2})\\s\\+0900\\]\\s
pattern  = /yyyy:HH:mm:ss
----------

and

----------
regex = ^\\d+\\.\\d+.\\d+.\\d+\\s\\S+\\s\\S+\\s\\[(\\d+\\/)[a-zA-z]{3}\\/\\d{4}:\\d{2}:\\d{2}:\\d{2}\\s\\+0900\\]\\s
pattern = dd/
----------

It works OK.

So, as I see, flume gives 'java.lang.IllegalArgumentException" error
because it fails to mapping "Jan" by using "MMM" pattern.

BTW, I used Cloudera Express 5.3.1.
And, the setting of case 1 works fine at another server which using
java 1.6.0_29.

Is is true that different java version is the reason causes mapping
"Jan" failed by using "MMM" pattern?
Is there anything that I missed?
Any help will be great.

Thank you

- mungeol

Re: Flume gives "java.lang.IllegalArgumentException" when using regex_extractor for extracting timestamp from apache access log

Posted by Mungeol Heo <mu...@gmail.com>.
I found the problem causes the error which was mentioned above.
It is because of the LANG setting of system.
It works fine after changing LANG to "en_US.UTF-8"

On Fri, Jan 30, 2015 at 6:34 PM, Mungeol Heo <mu...@gmail.com> wrote:
> case 1:
>
> the setting I used is listed below.
>
> ----------
> agent01.sources.source01.interceptors.interceptor02.type = regex_extractor
> agent01.sources.source01.interceptors.interceptor02.regex =
> ^\\d+\\.\\d+.\\d+.\\d+\\s\\S+\\s\\S+\\s\\[(\\d+\\/[a-zA-z]{3}\\/\\d{4}:\\d{2}:\\d{2}:\\d{2})\\s\\+0900\\]\\s
> agent01.sources.source01.interceptors.interceptor02.serializers = s01
> agent01.sources.source01.interceptors.interceptor02.serializers.s01.type
> = org.apache.flume.interceptor.RegexExtractorInterceptorMillisSerializer
> agent01.sources.source01.interceptors.interceptor02.serializers.s01.pattern
> = dd/MMM/yyyy:HH:mm:ss
> agent01.sources.source01.interceptors.interceptor02.serializers.s01.name
> = timestamp
> ----------
>
> It gives me an 'java.lang.IllegalArgumentException: Invalid format:
> "30/Jan/2015:15:01:03" is malformed at "Jan/2015:15:01:03"' error.
>
>
>
> case 2:
>
> the setting I used is listed below.
>
> ----------
> regex = ^\\d+\\.\\d+.\\d+.\\d+\\s\\S+\\s\\S+\\s\\[\\d+\\/([a-zA-z]{3})\\/\\d{4}:\\d{2}:\\d{2}:\\d{2}\\s\\+0900\\]\\s
> pattern = MMM
> ----------
>
> it gives me an 'java.lang.IllegalArgumentException: Invalid format:
> "Jan"' error.
>
>
>
> case 3:
>
> the setting I used are listed below.
>
> ----------
> regex  = ^\\d+\\.\\d+.\\d+.\\d+\\s\\S+\\s\\S+\\s\\[\\d+\\/[a-zA-z]{3}(\\/\\d{4}:\\d{2}:\\d{2}:\\d{2})\\s\\+0900\\]\\s
> pattern  = /yyyy:HH:mm:ss
> ----------
>
> and
>
> ----------
> regex = ^\\d+\\.\\d+.\\d+.\\d+\\s\\S+\\s\\S+\\s\\[(\\d+\\/)[a-zA-z]{3}\\/\\d{4}:\\d{2}:\\d{2}:\\d{2}\\s\\+0900\\]\\s
> pattern = dd/
> ----------
>
> It works OK.
>
> So, as I see, flume gives 'java.lang.IllegalArgumentException" error
> because it fails to mapping "Jan" by using "MMM" pattern.
>
> BTW, I used Cloudera Express 5.3.1.
> And, the setting of case 1 works fine at another server which using
> java 1.6.0_29.
>
> Is is true that different java version is the reason causes mapping
> "Jan" failed by using "MMM" pattern?
> Is there anything that I missed?
> Any help will be great.
>
> Thank you
>
> - mungeol