You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by Nisarg Shah <sn...@gmail.com> on 2016/06/20 05:43:31 UTC

Kafka Connect Transformers

Hello,

I am looking to do https://issues.apache.org/jira/browse/KAFKA-3209. I wanted feedback from the devs for the design that I’m proposing to put in place. Thanks a lot for all the discussions Ewen Cheslack-Postava.

A gist of how I plan to do it is by using ‘Transformers’ that can be configurationally chained together and data will pass through them between a source and destination for Kafka Connect.

To set up transformers, we propose using the properties to define Transformer classes one after the other. 
transformer=abc.Transformer1,xyz.Transformer2

Each transformer can get specific properties passed on from the same properties file, as it is with the Connectors.

About the actual signature for the transformation function that does all the work, how’s this interface? 
public abstract class Transformer<T1, T2> {
    public abstract T2 transform(T1 t1);

    public void initialize(Map<String, String> props) {}
}

Approach 1:
Functionally, the complete data can be passed. 
Just as the *Tasks get a complete List<*Record>, the transformer can get the same. The whole list passing makes rearranging or merging data possible. This can be helpful if transformations require looking up or down the messages. Allowing custom datatypes between transformers will allow custom objects to be passed around intermediate. Casting could be an issue.

Approach 2: 
Taking a simplistic approach and doing a message by message transformation. The transformer could store data from the previous message, but not go down the list of messages. From the comments by Michael Graff, both approaches would work, but if down looking is required, we would have to go with Approach 1. 

I will also have a working change ready for Approach 1 very soon but till then, please give me your suggestions. 

Thanks,
Nisarg.





Re: Kafka Connect Transformers

Posted by Gwen Shapira <gw...@confluent.io>.
Added wiki access. Enjoy :)

On Fri, Jul 1, 2016 at 11:24 AM, Nisarg Shah <sn...@gmail.com> wrote:
> Need to submit a KIP for https://issues.apache.org/jira/browse/KAFKA-3209. Please provide wiki write access to ‘snisarg’.
>
> Thanks,
> Nisarg Shah.
>
>> On Jun 28, 2016, at 6:27 PM, Nisarg Shah <sn...@gmail.com> wrote:
>>
>> Need permissions to edit the wiki. Username is ‘snisarg’.
>>
>> Thanks,
>> Nisarg.
>>
>>> On Jun 28, 2016, at 09:08, Nisarg Shah <snisarg@gmail.com <ma...@gmail.com>> wrote:
>>>
>>> Hello,
>>>
>>> I need to create a page so that I can write a Kafka Improvement Proposal for the below. My username is ‘snisarg’.
>>>
>>> Thanks,
>>> Nisarg
>>>
>>>> On Jun 19, 2016, at 10:43 PM, Nisarg Shah <snisarg@gmail.com <ma...@gmail.com>> wrote:
>>>>
>>>> Hello,
>>>>
>>>> I am looking to do https://issues.apache.org/jira/browse/KAFKA-3209 <https://issues.apache.org/jira/browse/KAFKA-3209>. I wanted feedback from the devs for the design that I’m proposing to put in place. Thanks a lot for all the discussions Ewen Cheslack-Postava.
>>>>
>>>> A gist of how I plan to do it is by using ‘Transformers’ that can be configurationally chained together and data will pass through them between a source and destination for Kafka Connect.
>>>>
>>>> To set up transformers, we propose using the properties to define Transformer classes one after the other.
>>>> transformer=abc.Transformer1,xyz.Transformer2
>>>>
>>>> Each transformer can get specific properties passed on from the same properties file, as it is with the Connectors.
>>>>
>>>> About the actual signature for the transformation function that does all the work, how’s this interface?
>>>> public abstract class Transformer<T1, T2> {
>>>>     public abstract T2 transform(T1 t1);
>>>>
>>>>     public void initialize(Map<String, String> props) {}
>>>> }
>>>>
>>>> Approach 1:
>>>> Functionally, the complete data can be passed.
>>>> Just as the *Tasks get a complete List<*Record>, the transformer can get the same. The whole list passing makes rearranging or merging data possible. This can be helpful if transformations require looking up or down the messages. Allowing custom datatypes between transformers will allow custom objects to be passed around intermediate. Casting could be an issue.
>>>>
>>>> Approach 2:
>>>> Taking a simplistic approach and doing a message by message transformation. The transformer could store data from the previous message, but not go down the list of messages. From the comments by Michael Graff, both approaches would work, but if down looking is required, we would have to go with Approach 1.
>>>>
>>>> I will also have a working change ready for Approach 1 very soon but till then, please give me your suggestions.
>>>>
>>>> Thanks,
>>>> Nisarg.
>>>>
>>>>
>>>>
>>>>
>>>
>>
>

Re: Kafka Connect Transformers

Posted by Nisarg Shah <sn...@gmail.com>.
Need to submit a KIP for https://issues.apache.org/jira/browse/KAFKA-3209. Please provide wiki write access to ‘snisarg’. 

Thanks,
Nisarg Shah.

> On Jun 28, 2016, at 6:27 PM, Nisarg Shah <sn...@gmail.com> wrote:
> 
> Need permissions to edit the wiki. Username is ‘snisarg’. 
> 
> Thanks,
> Nisarg.
> 
>> On Jun 28, 2016, at 09:08, Nisarg Shah <snisarg@gmail.com <ma...@gmail.com>> wrote:
>> 
>> Hello,
>> 
>> I need to create a page so that I can write a Kafka Improvement Proposal for the below. My username is ‘snisarg’. 
>> 
>> Thanks,
>> Nisarg
>> 
>>> On Jun 19, 2016, at 10:43 PM, Nisarg Shah <snisarg@gmail.com <ma...@gmail.com>> wrote:
>>> 
>>> Hello,
>>> 
>>> I am looking to do https://issues.apache.org/jira/browse/KAFKA-3209 <https://issues.apache.org/jira/browse/KAFKA-3209>. I wanted feedback from the devs for the design that I’m proposing to put in place. Thanks a lot for all the discussions Ewen Cheslack-Postava.
>>> 
>>> A gist of how I plan to do it is by using ‘Transformers’ that can be configurationally chained together and data will pass through them between a source and destination for Kafka Connect.
>>> 
>>> To set up transformers, we propose using the properties to define Transformer classes one after the other. 
>>> transformer=abc.Transformer1,xyz.Transformer2
>>> 
>>> Each transformer can get specific properties passed on from the same properties file, as it is with the Connectors.
>>> 
>>> About the actual signature for the transformation function that does all the work, how’s this interface? 
>>> public abstract class Transformer<T1, T2> {
>>>     public abstract T2 transform(T1 t1);
>>> 
>>>     public void initialize(Map<String, String> props) {}
>>> }
>>> 
>>> Approach 1:
>>> Functionally, the complete data can be passed. 
>>> Just as the *Tasks get a complete List<*Record>, the transformer can get the same. The whole list passing makes rearranging or merging data possible. This can be helpful if transformations require looking up or down the messages. Allowing custom datatypes between transformers will allow custom objects to be passed around intermediate. Casting could be an issue.
>>> 
>>> Approach 2: 
>>> Taking a simplistic approach and doing a message by message transformation. The transformer could store data from the previous message, but not go down the list of messages. From the comments by Michael Graff, both approaches would work, but if down looking is required, we would have to go with Approach 1. 
>>> 
>>> I will also have a working change ready for Approach 1 very soon but till then, please give me your suggestions. 
>>> 
>>> Thanks,
>>> Nisarg.
>>> 
>>> 
>>> 
>>> 
>> 
> 


Re: Kafka Connect Transformers

Posted by Nisarg Shah <sn...@gmail.com>.
Need permissions to edit the wiki. Username is ‘snisarg’. 

Thanks,
Nisarg.

> On Jun 28, 2016, at 09:08, Nisarg Shah <sn...@gmail.com> wrote:
> 
> Hello,
> 
> I need to create a page so that I can write a Kafka Improvement Proposal for the below. My username is ‘snisarg’. 
> 
> Thanks,
> Nisarg
> 
>> On Jun 19, 2016, at 10:43 PM, Nisarg Shah <snisarg@gmail.com <ma...@gmail.com>> wrote:
>> 
>> Hello,
>> 
>> I am looking to do https://issues.apache.org/jira/browse/KAFKA-3209 <https://issues.apache.org/jira/browse/KAFKA-3209>. I wanted feedback from the devs for the design that I’m proposing to put in place. Thanks a lot for all the discussions Ewen Cheslack-Postava.
>> 
>> A gist of how I plan to do it is by using ‘Transformers’ that can be configurationally chained together and data will pass through them between a source and destination for Kafka Connect.
>> 
>> To set up transformers, we propose using the properties to define Transformer classes one after the other. 
>> transformer=abc.Transformer1,xyz.Transformer2
>> 
>> Each transformer can get specific properties passed on from the same properties file, as it is with the Connectors.
>> 
>> About the actual signature for the transformation function that does all the work, how’s this interface? 
>> public abstract class Transformer<T1, T2> {
>>     public abstract T2 transform(T1 t1);
>> 
>>     public void initialize(Map<String, String> props) {}
>> }
>> 
>> Approach 1:
>> Functionally, the complete data can be passed. 
>> Just as the *Tasks get a complete List<*Record>, the transformer can get the same. The whole list passing makes rearranging or merging data possible. This can be helpful if transformations require looking up or down the messages. Allowing custom datatypes between transformers will allow custom objects to be passed around intermediate. Casting could be an issue.
>> 
>> Approach 2: 
>> Taking a simplistic approach and doing a message by message transformation. The transformer could store data from the previous message, but not go down the list of messages. From the comments by Michael Graff, both approaches would work, but if down looking is required, we would have to go with Approach 1. 
>> 
>> I will also have a working change ready for Approach 1 very soon but till then, please give me your suggestions. 
>> 
>> Thanks,
>> Nisarg.
>> 
>> 
>> 
>> 
> 


Re: Kafka Connect Transformers

Posted by Nisarg Shah <sn...@gmail.com>.
Hello,

I need to create a page so that I can write a Kafka Improvement Proposal for the below. My username is ‘snisarg’. 

Thanks,
Nisarg

> On Jun 19, 2016, at 10:43 PM, Nisarg Shah <sn...@gmail.com> wrote:
> 
> Hello,
> 
> I am looking to do https://issues.apache.org/jira/browse/KAFKA-3209 <https://issues.apache.org/jira/browse/KAFKA-3209>. I wanted feedback from the devs for the design that I’m proposing to put in place. Thanks a lot for all the discussions Ewen Cheslack-Postava.
> 
> A gist of how I plan to do it is by using ‘Transformers’ that can be configurationally chained together and data will pass through them between a source and destination for Kafka Connect.
> 
> To set up transformers, we propose using the properties to define Transformer classes one after the other. 
> transformer=abc.Transformer1,xyz.Transformer2
> 
> Each transformer can get specific properties passed on from the same properties file, as it is with the Connectors.
> 
> About the actual signature for the transformation function that does all the work, how’s this interface? 
> public abstract class Transformer<T1, T2> {
>     public abstract T2 transform(T1 t1);
> 
>     public void initialize(Map<String, String> props) {}
> }
> 
> Approach 1:
> Functionally, the complete data can be passed. 
> Just as the *Tasks get a complete List<*Record>, the transformer can get the same. The whole list passing makes rearranging or merging data possible. This can be helpful if transformations require looking up or down the messages. Allowing custom datatypes between transformers will allow custom objects to be passed around intermediate. Casting could be an issue.
> 
> Approach 2: 
> Taking a simplistic approach and doing a message by message transformation. The transformer could store data from the previous message, but not go down the list of messages. From the comments by Michael Graff, both approaches would work, but if down looking is required, we would have to go with Approach 1. 
> 
> I will also have a working change ready for Approach 1 very soon but till then, please give me your suggestions. 
> 
> Thanks,
> Nisarg.
> 
> 
> 
>