You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "whsoul (Jira)" <ji...@apache.org> on 2020/01/15 10:46:00 UTC

[jira] [Commented] (KAFKA-9436) New Kafka Connect SMT for plainText => Struct(or Map)

    [ https://issues.apache.org/jira/browse/KAFKA-9436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17015833#comment-17015833 ] 

whsoul commented on KAFKA-9436:
-------------------------------

[https://github.com/apache/kafka/pull/7965]

> New Kafka Connect SMT for plainText => Struct(or Map)
> -----------------------------------------------------
>
>                 Key: KAFKA-9436
>                 URL: https://issues.apache.org/jira/browse/KAFKA-9436
>             Project: Kafka
>          Issue Type: Improvement
>          Components: KafkaConnect
>            Reporter: whsoul
>            Priority: Major
>
> I'd like to parse and convert plain text rows to struct(or map) data, and load into documented database such as mongoDB, elasticSearch, etc... with SMT
>  
> For example
>  
> plain text apache log
> {code:java}
> "111.61.73.113 - - [08/Aug/2019:18:15:29 +0900] \"OPTIONS /api/v1/service_config HTTP/1.1\" 200 - 101989 \"http://local.test.com/\" \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.142 Safari/537.36\""
> {code}
> SMT connect config with regular expression below can easily transform a plain text to struct (or map) data.
>  
> {code:java}
> "transforms": "TimestampTopic, RegexTransform",
> "transforms.RegexTransform.type": "org.apache.kafka.connect.transforms.ToStructByRegexTransform$Value",
> "transforms.RegexTransform.struct.field": "message",
> "transforms.RegexTransform.regex": "^([\\d.]+) (\\S+) (\\S+) \\[([\\w:/]+\\s[+\\-]\\d{4})\\] \"(GET|POST|OPTIONS|HEAD|PUT|DELETE|PATCH) (.+?) (.+?)\" (\\d{3}) ([0-9|-]+) ([0-9|-]+) \"([^\"]+)\" \"([^\"]+)\""
> "transforms.RegexTransform.mapping": "IP,RemoteUser,AuthedRemoteUser,DateTime,Method,Request,Protocol,Response,BytesSent,Ms:NUMBER,Referrer,UserAgent"
> {code}
>  
> I have PR about this



--
This message was sent by Atlassian Jira
(v8.3.4#803005)