You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "whsoul (Jira)" <ji...@apache.org> on 2022/06/14 12:16:00 UTC

[jira] [Commented] (KAFKA-9436) New Kafka Connect SMT for plainText => Struct(or Map)

    [ https://issues.apache.org/jira/browse/KAFKA-9436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17554075#comment-17554075 ] 

whsoul commented on KAFKA-9436:
-------------------------------

this was simplified according to the review from chris

 

1. String parse
{code:java}
dev_kafka_pc001_1580372261372{code}
{code:java}
"transforms": "RegexTransform",
"transforms.RegexTransform.type": "org.apache.kafka.connect.transforms.ParseStructByRegex$Value",

"transforms.RegexTransform.regex": "^(.{3,4})_(.*)_(pc|mw|ios|and)([0-9]{3})_([0-9]{13})" "transforms.RegexTransform.mapping": "env,serviceId,device,sequence,datetime"{code}
 

 

2. plain text apache log
{code:java}
111.61.73.113 - - [08/Aug/2019:18:15:29 +0900] \"OPTIONS /api/v1/service_config HTTP/1.1\" 200 - 101989 \"http://local.test.com/\" \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.142 Safari/537.36\"
{code}
SMT connect config with regular expression below can easily transform a plain text to struct (or map) data.

 
{code:java}
"transforms": "RegexTransform",
"transforms.RegexTransform.type": "org.apache.kafka.connect.transforms.ParseStructByRegex$Value",

"transforms.RegexTransform.regex": "^([\\d.]+) (\\S+) (\\S+) \\[([\\w:/]+\\s[+\\-]\\d{4})\\] \"(GET|POST|OPTIONS|HEAD|PUT|DELETE|PATCH) (.+?) (.+?)\" (\\d{3}) ([0-9|-]+) ([0-9|-]+) \"([^\"]+)\" \"([^\"]+)\""

"transforms.RegexTransform.mapping": "IP,RemoteUser,AuthedRemoteUser,DateTime,Method,Request,Protocol,Response,BytesSent,Ms,Referrer,UserAgent"{code}

> New Kafka Connect SMT for plainText => Struct(or Map)
> -----------------------------------------------------
>
>                 Key: KAFKA-9436
>                 URL: https://issues.apache.org/jira/browse/KAFKA-9436
>             Project: Kafka
>          Issue Type: New Feature
>          Components: KafkaConnect
>            Reporter: whsoul
>            Priority: Major
>              Labels: needs-kip
>
> I'd like to parse and convert plain text rows to struct(or map) data, and load into documented database such as mongoDB, elasticSearch, etc... with SMT
>  
> For example
>  
> 1. String parse ( with timemillis )
> {code:java}
> {
>    "code" : "dev_kafka_pc001_1580372261372"
>    ,"recode1" : "a"
>    ,"recode2" : "b" 
> }{code}
> {code:java}
> "transforms": "RegexTransform",
> "transforms.RegexTransform.type": "org.apache.kafka.connect.transforms.ToStructByRegexTransform$Value",
> "transforms.RegexTransform.struct.field": "message",
> "transforms.RegexTransform.regex": "^(.{3,4})_(.*)_(pc|mw|ios|and)([0-9]{3})_([0-9]{13})" "transforms.RegexTransform.mapping": "env,serviceId,device,sequence,datetime:TIMEMILLIS"{code}
>  
>  
> 2. plain text apache log
> {code:java}
> "111.61.73.113 - - [08/Aug/2019:18:15:29 +0900] \"OPTIONS /api/v1/service_config HTTP/1.1\" 200 - 101989 \"http://local.test.com/\" \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.142 Safari/537.36\""
> {code}
> SMT connect config with regular expression below can easily transform a plain text to struct (or map) data.
>  
> {code:java}
> "transforms": "RegexTransform",
> "transforms.RegexTransform.type": "org.apache.kafka.connect.transforms.ToStructByRegexTransform$Value",
> "transforms.RegexTransform.struct.field": "message",
> "transforms.RegexTransform.regex": "^([\\d.]+) (\\S+) (\\S+) \\[([\\w:/]+\\s[+\\-]\\d{4})\\] \"(GET|POST|OPTIONS|HEAD|PUT|DELETE|PATCH) (.+?) (.+?)\" (\\d{3}) ([0-9|-]+) ([0-9|-]+) \"([^\"]+)\" \"([^\"]+)\""
> "transforms.RegexTransform.mapping": "IP,RemoteUser,AuthedRemoteUser,DateTime,Method,Request,Protocol,Response,BytesSent,Ms:NUMBER,Referrer,UserAgent"
> {code}
>  
> I have PR about this



--
This message was sent by Atlassian Jira
(v8.20.7#820007)