You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2022/06/02 14:53:03 UTC

[GitHub] [pulsar] cbornet opened a new issue, #15902: [PIP] PIP-173 : Create a built-in Function implementing the most common basic transformations

cbornet opened a new issue, #15902:
URL: https://github.com/apache/pulsar/issues/15902

   
   <!---
   Instructions for creating a PIP using this issue template:
   
    1. The author(s) of the proposal will create a GitHub issue ticket using this template.
       (Optionally, it can be helpful to send a note discussing the proposal to
       dev@pulsar.apache.org mailing list before submitting this GitHub issue. This discussion can
       help developers gauge interest in the proposed changes before formalizing the proposal.)
    2. The author(s) will send a note to the dev@pulsar.apache.org mailing list
       to start the discussion, using subject prefix `[PIP] xxx`. To determine the appropriate PIP
       number `xxx`, inspect the mailing list (https://lists.apache.org/list.html?dev@pulsar.apache.org)
       for the most recent PIP. Add 1 to that PIP"s number to get your PIP"s number.
    3. Based on the discussion and feedback, some changes might be applied by
       the author(s) to the text of the proposal.
    4. Once some consensus is reached, there will be a vote to formally approve
       the proposal. The vote will be held on the dev@pulsar.apache.org mailing list. Everyone
       is welcome to vote on the proposal, though it will considered to be binding
       only the vote of PMC members. It will be required to have a lazy majority of
       at least 3 binding +1s votes. The vote should stay open for at least 48 hours.
    5. When the vote is closed, if the outcome is positive, the state of the
       proposal is updated and the Pull Requests associated with this proposal can
       start to get merged into the master branch.
   
   -->
   
   ## Motivation
   
   Currently, when users want to modify the data in Pulsar, they need to write a Function.
   For a lot of use cases, it would be handy for them to be able to use a ready-made built-in Function that implements the most common basic transformations like the ones available in [Kafka Connect’s SMTs](https://docs.confluent.io/platform/current/connect/transforms/overview.html).
   This removes users the burden of writing the Function themselves, having to understanding the perks of Pulsar Schemas, coding in a language that they may not master (probably Java if they want to do advanced stuff), and they benefit from battle-tested, maintained, performance-optimised code.
   
   ## Goal
   
   This PIP is about providing a `TransformFunction` that executes a sequence of basic transformations on the data.
   The `TransformFunction` shall be easy to configure, launchable as a built-in NAR.
   The `TransformFunction` shall be able to apply a sequence of common transformations in-memory so we don’t need to execute the `TransformFunction` multiple times and read/write to a topic each time.
   
   This PIP is not about appending such a Function to a Source or a Sink. 
   While this is the ultimate goal, so we can provide an experience similar to Kafka SMTs and avoid a read/write to a topic, this work will be done in a future PIP. 
   It is expected that the code written for this PIP will be reusable in this future work. 
   
   ## API Changes
   
   This PIP will introduce a new `transform` module in `pulsar-function` multi-module project.
 The produced artifact will be a NAR of the TransformFunction.
   
   ## Implementation
   
   When it processes a record, `TransformFunction` will :

   
   * Create a mutable structure `TransformContext` that contains
   
   ```java
   @Data
   public class TransformContext {
       private Context context;
       private Schema<?> keySchema;
       private Object keyObject;
       private boolean keyModified;
       private Schema<?> valueSchema;
       private Object valueObject;
       private boolean valueModified;
       private KeyValueEncodingType keyValueEncodingType;
       private String key;
       private Map<String, String> properties;
       private String outputTopic;
   ```
   
   If the record is a `KeyValue`, the key and value schemas and object are unpacked. Otherwise the `keySchema` and `keyObject` are null.
   
   * Call in sequence the process method of a series of `TransformStep` on this `TransformContext`
   
   ```java
   public interface TransformStep {
       void process(TransformContext transformContext) throws Exception;
   }
   ```
   
   Each `TransformStep` can then modify the `TransformContext` as needed.

   
   * Call the `send()` method of the `TransformContext` which will create the message to send to the outputTopic, repacking the KeyValue if needed.

   
   The `TransformFunction` will read its configuration as Json from `userConfig` in the format:
   
   ```json
   {
     "steps": [
       {
         "type": "drop-fields", "fields": "keyField1,keyField2", "part": "key"
       },
       {
         "type": "merge-key-value"
       },
       {
         "type": "unwrap-key-value"
       },
       {
         "type": "cast", "schema-type": "STRING"
       }
     ]
   }
   ```
   
   Each step is defined by its `type` and uses its own arguments.
   
   This example config applied on a KeyValue<AVRO, AVRO> input record with value `{key={keyField1: key1, keyField2: key2, keyField3: key3}, value={valueField1: value1, valueField2: value2, valueField3: value3}}` will give after each step:
   ```
   {key={keyField1: key1, keyField2: key2, keyField3: key3}, value={valueField1: value1, valueField2: value2, valueField3: value3}}(KeyValue<AVRO, AVRO>)

              |
              | ”type": "drop-fields", "fields": "keyField1,keyField2”, "part": "key”
              |
   {key={keyField3: key3}, value={valueField1: value1, valueField2: value2, valueField3: value3}} (KeyValue<AVRO, AVRO>)
              |
              | "type": "merge-key-value"
              |

   {key={keyField3: key3}, value={keyField3: key3, valueField1: value1, valueField2: value2, valueField3: value3}} (KeyValue<AVRO, AVRO>)
              |
              | "type": "unwrap-key-value"
              |
   {keyField3: key3, valueField1: value1, valueField2: value2, valueField3: value3} (AVRO)
              |
              | "type": "cast", "schema-type": "STRING"
              |
   {"keyField3": "key3", "valueField1": "value1", "valueField2": "value2", "valueField3": "value3"} (STRING)
   ```
   
   `TransformFunction` will be built as a NAR including a `pulsar-io.yaml` service file so it can be registered as a built-in function with name `transform`.
   
   ## Reject Alternatives
   
   None
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] nlu90 commented on issue #15902: [PIP] PIP-173 : Create a built-in Function implementing the most common basic transformations

Posted by GitBox <gi...@apache.org>.
nlu90 commented on issue #15902:
URL: https://github.com/apache/pulsar/issues/15902#issuecomment-1150207152

   I would suggest having some concrete implementations of the TransformFunction in a separate repo first instead of starting right in the main pulsar project. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] nlu90 commented on issue #15902: [PIP] PIP-173 : Create a built-in Function implementing the most common basic transformations

Posted by GitBox <gi...@apache.org>.
nlu90 commented on issue #15902:
URL: https://github.com/apache/pulsar/issues/15902#issuecomment-1154455610

   @dave2wave @cbornet @nicoloboschi 
   
   Thanks for your replies. I really would like to hear more about use cases and performance motivations to better understand it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] eolivelli closed issue #15902: [PIP] PIP-173 : Create a built-in Function implementing the most common basic transformations

Posted by GitBox <gi...@apache.org>.
eolivelli closed issue #15902: [PIP] PIP-173 : Create a built-in Function implementing the most common basic transformations
URL: https://github.com/apache/pulsar/issues/15902


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] eolivelli commented on issue #15902: [PIP] PIP-173 : Create a built-in Function implementing the most common basic transformations

Posted by GitBox <gi...@apache.org>.
eolivelli commented on issue #15902:
URL: https://github.com/apache/pulsar/issues/15902#issuecomment-1253234883

   This PIP was not accepted
   
   The work has been posted here
   https://github.com/datastax/pulsar-transformations
   and it is available to anyone, it is Apache 2 licensed


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] cbornet commented on issue #15902: [PIP] PIP-173 : Create a built-in Function implementing the most common basic transformations

Posted by GitBox <gi...@apache.org>.
cbornet commented on issue #15902:
URL: https://github.com/apache/pulsar/issues/15902#issuecomment-1146054014

   Thanks Enrico. I did the updates.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] github-actions[bot] commented on issue #15902: [PIP] PIP-173 : Create a built-in Function implementing the most common basic transformations

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on issue #15902:
URL: https://github.com/apache/pulsar/issues/15902#issuecomment-1183906374

   The issue had no activity for 30 days, mark with Stale label.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] eolivelli commented on issue #15902: [PIP] PIP-173 : Create a built-in Function implementing the most common basic transformations

Posted by GitBox <gi...@apache.org>.
eolivelli commented on issue #15902:
URL: https://github.com/apache/pulsar/issues/15902#issuecomment-1145806303

   I would not go too much into the implementation details in the PIP like `TransformContext`
   
   I would only cite the steps, the configurations and the first operations that will be available.
   the TransformContext without the PR is very hard to understand and actually, it is a implementation detail, hidden to the users and we will change it in the future as needed.
   
   I would add to "Reject Alternatives":
   
   > Create a separate third party project not managed by the Pulsar community.
   > Problems:
   > it won't be easily available to all Pulsar users
   > it would be hard to guarantee compatibility with many Pulsar versions, and the Transformations will use many advanced features of Pulsar APIs


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] cbornet commented on issue #15902: [PIP] PIP-173 : Create a built-in Function implementing the most common basic transformations

Posted by GitBox <gi...@apache.org>.
cbornet commented on issue #15902:
URL: https://github.com/apache/pulsar/issues/15902#issuecomment-1150879270

   > I would suggest having some concrete implementations of the TransformFunction in a separate repo first instead of starting right in the main pulsar project.
   
   @nlu90 see the comment from @eolivelli . We've put this in "Rejected alternatives" for the reasons that:
   * it won't be easily available to all Pulsar users
   * it would be hard to guarantee compatibility with many Pulsar versions, and the Transformations will use many advanced features of Pulsar APIs


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] nicoloboschi commented on issue #15902: [PIP] PIP-173 : Create a built-in Function implementing the most common basic transformations

Posted by GitBox <gi...@apache.org>.
nicoloboschi commented on issue #15902:
URL: https://github.com/apache/pulsar/issues/15902#issuecomment-1150922439

   @nlu90 The transformations will be well tested in the codebase. The use cases do not came out of the blue (in that case you may wonder if they add value for users) but they are inspired by Kafka/Confluent Platform.
   
   The very good thing about this proposal is that they are builtin but they won't be an additional weight for users that do not want to use them.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] dave2wave commented on issue #15902: [PIP] PIP-173 : Create a built-in Function implementing the most common basic transformations

Posted by GitBox <gi...@apache.org>.
dave2wave commented on issue #15902:
URL: https://github.com/apache/pulsar/issues/15902#issuecomment-1151144544

   @nlu90 Would you like to have a conversation with @cbornet to discuss the performance motivations for this change?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org