You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by "dragon-zhang (via GitHub)" <gi...@apache.org> on 2024/01/30 11:16:37 UTC

[PR] [improve][pip] PIP-331: WASM Function API [pulsar]

dragon-zhang opened a new pull request, #21992:
URL: https://github.com/apache/pulsar/pull/21992

   ### Motivation
   
   A PIP proposal supports WASM Function API.
   
   ### Documentation
   
   - [x] `doc` <!-- Your PR contains doc changes. -->
   - [ ] `doc-required` <!-- Your PR changes impact docs and you will update later -->
   - [ ] `doc-not-needed` <!-- Your PR changes do not impact docs -->
   - [ ] `doc-complete` <!-- Docs have been already added -->


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [improve][pip] PIP-336: WASM Function API [pulsar]

Posted by "loongs-zhang (via GitHub)" <gi...@apache.org>.
loongs-zhang commented on code in PR #21992:
URL: https://github.com/apache/pulsar/pull/21992#discussion_r1494060140


##########
pip/pip-331.md:
##########
@@ -0,0 +1,129 @@
+# PIP-331: WASM Function API
+
+# Background knowledge
+
+WASM(WebAssembly) bytecode is designed to be encoded in a size- and load-time-efficient binary format. WASM aims to leverage the common hardware features available on various platforms to execute in browsers at machine code speed.
+
+WASI(WebAssembly System Interface) provide a portable interface for applications that run within a constrained sandbox environment, which allows WASM to run in non browser environments such as Linux. It's portable and secure.
+
+# Motivation
+
+The server and client sides of the Pulsar function use protobuf for decoupling. In principle, the language supported by protobuf can be supported by the pulsar function, now Pulsar provided the java, python and golang function client, but there are still many languages that are not supported.
+
+Before all language adaptations are completed (and it's almost entirely certain to be impossible), users cannot write pulsar function in their familiar languages.
+
+# Goals
+
+## In Scope
+
+Other languages, as long as their code can be compiled into WASM bytecode (such as Rust/golang/C++), users can use these languages to write pulsar function.
+
+## Out of Scope
+
+All existing abilities of the Java pulsar function client are not reimplemented, the WASM Pulsar functions is under the Java Pulsar functions.
+
+Due to the strict requirements of WASM on parameter types and for simplicity reasons, types other than `java.lang.Long` are not used as parameters or return value.
+
+# High Level Design
+
+```mermaid 
+flowchart LR;
+
+    subgraph develop
+        direction TB
+        SourceCode ==> |"CompileToWASM"| WasmFile ==> |"RenameFile"| MoveToTheResourceDirectory ==> UnitTest
+    end
+
+    subgraph runtime
+        direction TB
+        PulsarFunctionJava ==> |"LoadFromResource"| TheWasmFile ==> |"Invoke"| TheSourceCode
+    end
+    
+    develop --> runtime
+```
+
+# Detailed Design
+
+## Design & Implementation Details
+
+1. add `WasmLoader` to load WASM file and provide the WASM function to java, also provide the java function to WASM if we need.
+
+2. add `AbstractWasmFunction` and `AbstractWasmWindowFunction` as the core interface of the WASM function api.
+
+```java
+public abstract class AbstractWasmFunction<X, T> extends WasmLoader implements Function<X, T> {
+
+    private static final String PROCESS_METHOD_NAME = "process";
+
+    protected static final String INITIALIZE_METHOD_NAME = "initialize";
+
+    protected static final String CLOSE_METHOD_NAME = "close";
+
+    protected static final Map<Long, Argument<?>> ARGUMENTS = new ConcurrentHashMap<>();
+
+    @Override
+    public T process(X input, Context context) {
+        return super.getWasmExtern(PROCESS_METHOD_NAME)
+                .map(process -> {
+                    Long argumentId = callWASI(input, context, process);
+                    return doProcess(input, context, argumentId);
+                })
+                .orElseThrow(() -> new PulsarWasmException(
+                        PROCESS_METHOD_NAME + " function not found in " + super.getWasmName()));
+    }
+
+    private Long callWASI(X input,
+                          Context context,
+                          Extern process) {
+        // call WASI function
+        final Long argumentId = getArgumentId(input, context);
+        ARGUMENTS.put(argumentId, new Argument<>(input, context));
+        // WASI cannot easily pass Java objects like JNI, here we pass Long
+        // then we can get the argument by Long
+        WasmFunctions.consumer(super.getStore(), process.func(), WasmValType.I64)
+                .accept(argumentId);
+        ARGUMENTS.remove(argumentId);
+        return argumentId;
+    }
+
+    protected abstract T doProcess(X input, Context context, Long argumentId);
+
+    protected abstract Long getArgumentId(X input, Context context);
+
+    @Override
+    public void initialize(Context context) {
+        super.getWasmExtern(INITIALIZE_METHOD_NAME)
+                .ifPresent(initialize -> callWASI(null, context, initialize));
+    }
+
+    @Override
+    public void close() {
+        super.getWasmExtern(CLOSE_METHOD_NAME)
+                .ifPresent(close -> callWASI(null, null, close));
+        super.close();
+    }
+
+    protected static class Argument<X> {
+        protected X input;
+        protected Context context;
+
+        private Argument(X input, Context context) {
+            this.input = input;
+            this.context = context;
+        }
+    }
+}
+```
+
+More detailed code implementation and test can be found in [here](https://github.com/apache/pulsar/pull/21975)
+
+# Security Considerations
+
+Maybe need to add folders with tenancy name in the resource directory to prevent conflicts between WASM file names of different tenancies.

Review Comment:
   > What is the resource directory? 
   
   For example, `resource/{tenancyName}/{wasmFileName}.wasm`
   
   > Is it shared today by Pulsar functions?
   
   It should not be shared, adding `tenancy name` is just to avoid path consistency conflicts.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [improve][pip] PIP-336: WASM Function API [pulsar]

Posted by "loongs-zhang (via GitHub)" <gi...@apache.org>.
loongs-zhang commented on code in PR #21992:
URL: https://github.com/apache/pulsar/pull/21992#discussion_r1493948637


##########
pip/pip-331.md:
##########
@@ -0,0 +1,129 @@
+# PIP-331: WASM Function API
+
+# Background knowledge
+
+WASM(WebAssembly) bytecode is designed to be encoded in a size- and load-time-efficient binary format. WASM aims to leverage the common hardware features available on various platforms to execute in browsers at machine code speed.
+
+WASI(WebAssembly System Interface) provide a portable interface for applications that run within a constrained sandbox environment, which allows WASM to run in non browser environments such as Linux. It's portable and secure.
+
+# Motivation
+
+The server and client sides of the Pulsar function use protobuf for decoupling. In principle, the language supported by protobuf can be supported by the pulsar function, now Pulsar provided the java, python and golang function client, but there are still many languages that are not supported.

Review Comment:
   yes



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [improve][pip] PIP-336: WASM Function API [pulsar]

Posted by "loongs-zhang (via GitHub)" <gi...@apache.org>.
loongs-zhang commented on code in PR #21992:
URL: https://github.com/apache/pulsar/pull/21992#discussion_r1493962050


##########
pip/pip-331.md:
##########
@@ -0,0 +1,129 @@
+# PIP-331: WASM Function API
+
+# Background knowledge
+
+WASM(WebAssembly) bytecode is designed to be encoded in a size- and load-time-efficient binary format. WASM aims to leverage the common hardware features available on various platforms to execute in browsers at machine code speed.
+
+WASI(WebAssembly System Interface) provide a portable interface for applications that run within a constrained sandbox environment, which allows WASM to run in non browser environments such as Linux. It's portable and secure.
+
+# Motivation
+
+The server and client sides of the Pulsar function use protobuf for decoupling. In principle, the language supported by protobuf can be supported by the pulsar function, now Pulsar provided the java, python and golang function client, but there are still many languages that are not supported.
+
+Before all language adaptations are completed (and it's almost entirely certain to be impossible), users cannot write pulsar function in their familiar languages.
+
+# Goals
+
+## In Scope
+
+Other languages, as long as their code can be compiled into WASM bytecode (such as Rust/golang/C++), users can use these languages to write pulsar function.
+
+## Out of Scope
+
+All existing abilities of the Java pulsar function client are not reimplemented, the WASM Pulsar functions is under the Java Pulsar functions.
+
+Due to the strict requirements of WASM on parameter types and for simplicity reasons, types other than `java.lang.Long` are not used as parameters or return value.
+
+# High Level Design
+
+```mermaid 
+flowchart LR;
+
+    subgraph develop
+        direction TB
+        SourceCode ==> |"CompileToWASM"| WasmFile ==> |"RenameFile"| MoveToTheResourceDirectory ==> UnitTest
+    end
+
+    subgraph runtime
+        direction TB
+        PulsarFunctionJava ==> |"LoadFromResource"| TheWasmFile ==> |"Invoke"| TheSourceCode
+    end
+    
+    develop --> runtime
+```
+
+# Detailed Design
+
+## Design & Implementation Details
+
+1. add `WasmLoader` to load WASM file and provide the WASM function to java, also provide the java function to WASM if we need.
+
+2. add `AbstractWasmFunction` and `AbstractWasmWindowFunction` as the core interface of the WASM function api.
+
+```java
+public abstract class AbstractWasmFunction<X, T> extends WasmLoader implements Function<X, T> {
+
+    private static final String PROCESS_METHOD_NAME = "process";
+
+    protected static final String INITIALIZE_METHOD_NAME = "initialize";
+
+    protected static final String CLOSE_METHOD_NAME = "close";
+
+    protected static final Map<Long, Argument<?>> ARGUMENTS = new ConcurrentHashMap<>();
+
+    @Override
+    public T process(X input, Context context) {
+        return super.getWasmExtern(PROCESS_METHOD_NAME)
+                .map(process -> {
+                    Long argumentId = callWASI(input, context, process);

Review Comment:
   The WASM bytecode has been loaded, here we directly obtain the abstract WASM function for Java to call.
   
   We load the WASM bytecode in `WasmLoader#WasmLoader`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [improve][pip] PIP-336: WASM Support for pulsar-function-java [pulsar]

Posted by "lburgazzoli (via GitHub)" <gi...@apache.org>.
lburgazzoli commented on PR #21992:
URL: https://github.com/apache/pulsar/pull/21992#issuecomment-1994627795

   > In the description of [java-sdk](https://github.com/extism/java-sdk), JSON is also used for deserialization, and it requires the installation of additional dependencies, which is particularly uncomfortable.
   
   that is the old one, I new one will come based on chicory


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [improve][pip] PIP-336: WASM Support for pulsar-function-java [pulsar]

Posted by "loongs-zhang (via GitHub)" <gi...@apache.org>.
loongs-zhang commented on PR #21992:
URL: https://github.com/apache/pulsar/pull/21992#issuecomment-1988617458

   > > I'm quite new to Pulsar, but reading through the mailing list, is seems that It would also probably help to have a optimized support for the function type which for WASM would probably be very useful.
   > 
   > @lburgazzoli I agree that support for WASM would be very useful. However, the current design of embedding inside the Java runtime doesn't seem very efficient since the JVM will add it's overhead. With a pluggable runtime type, it would be fine to have such implementations on the way to a more optimized solution. That's one more reason to have pluggable runtime types since it lowers the barriers to add new types or improve some what would be available.
   
   Hi, I just read your previous email, my understanding is that you plan to have the pulsar server directly call the pluggable runtime, which means that pulsar will directly send messages to `pulsar-runtime-wasm`?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [improve][pip] PIP-336: WASM Function API [pulsar]

Posted by "loongs-zhang (via GitHub)" <gi...@apache.org>.
loongs-zhang commented on code in PR #21992:
URL: https://github.com/apache/pulsar/pull/21992#discussion_r1493953463


##########
pip/pip-331.md:
##########
@@ -0,0 +1,129 @@
+# PIP-331: WASM Function API
+
+# Background knowledge
+
+WASM(WebAssembly) bytecode is designed to be encoded in a size- and load-time-efficient binary format. WASM aims to leverage the common hardware features available on various platforms to execute in browsers at machine code speed.
+
+WASI(WebAssembly System Interface) provide a portable interface for applications that run within a constrained sandbox environment, which allows WASM to run in non browser environments such as Linux. It's portable and secure.
+
+# Motivation
+
+The server and client sides of the Pulsar function use protobuf for decoupling. In principle, the language supported by protobuf can be supported by the pulsar function, now Pulsar provided the java, python and golang function client, but there are still many languages that are not supported.
+
+Before all language adaptations are completed (and it's almost entirely certain to be impossible), users cannot write pulsar function in their familiar languages.
+
+# Goals
+
+## In Scope
+
+Other languages, as long as their code can be compiled into WASM bytecode (such as Rust/golang/C++), users can use these languages to write pulsar function.
+
+## Out of Scope
+
+All existing abilities of the Java pulsar function client are not reimplemented, the WASM Pulsar functions is under the Java Pulsar functions.

Review Comment:
   In https://github.com/apache/pulsar/pull/21975, I use [wasmtime-java](https://github.com/kawamuray/wasmtime-java) to implment this PIP, another master suggests using [chicory](https://github.com/dylibso/chicory), both they are Java libraries.
   
   I'll update this comment to the doc later.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [improve][pip] PIP-336: WASM Support for pulsar-function-java [pulsar]

Posted by "asafm (via GitHub)" <gi...@apache.org>.
asafm commented on PR #21992:
URL: https://github.com/apache/pulsar/pull/21992#issuecomment-1988612006

   > In addition, there is an awesome WASM plugin system https://github.com/extism/extism which I did not take into account because as said already, I did want to prove the _Chicory_ APIs and also because _Chicory_ is not yet supported
   
   Seems quite nice. I couldn't really figure out there how or where you define the "interface" of your plugin, which the users will implement.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [improve][pip] PIP-336: WASM Support for pulsar-function-java [pulsar]

Posted by "lburgazzoli (via GitHub)" <gi...@apache.org>.
lburgazzoli commented on PR #21992:
URL: https://github.com/apache/pulsar/pull/21992#issuecomment-1990910473

   FWIW, in case you go on this pat, I would recommend having a look at https://wazero.io/, a pure Go lang WASM runtime (which eventually would allow to re-use some of the go functions code)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [improve][pip] PIP-336: WASM Function API [pulsar]

Posted by "loongs-zhang (via GitHub)" <gi...@apache.org>.
loongs-zhang commented on PR #21992:
URL: https://github.com/apache/pulsar/pull/21992#issuecomment-1987115561

   @lburgazzoli @asafm @RobertIndie please review it again.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [improve][pip] PIP-336: WASM Function API [pulsar]

Posted by "loongs-zhang (via GitHub)" <gi...@apache.org>.
loongs-zhang commented on code in PR #21992:
URL: https://github.com/apache/pulsar/pull/21992#discussion_r1493970517


##########
pip/pip-331.md:
##########
@@ -0,0 +1,129 @@
+# PIP-331: WASM Function API
+
+# Background knowledge
+
+WASM(WebAssembly) bytecode is designed to be encoded in a size- and load-time-efficient binary format. WASM aims to leverage the common hardware features available on various platforms to execute in browsers at machine code speed.
+
+WASI(WebAssembly System Interface) provide a portable interface for applications that run within a constrained sandbox environment, which allows WASM to run in non browser environments such as Linux. It's portable and secure.
+
+# Motivation
+
+The server and client sides of the Pulsar function use protobuf for decoupling. In principle, the language supported by protobuf can be supported by the pulsar function, now Pulsar provided the java, python and golang function client, but there are still many languages that are not supported.
+
+Before all language adaptations are completed (and it's almost entirely certain to be impossible), users cannot write pulsar function in their familiar languages.
+
+# Goals
+
+## In Scope
+
+Other languages, as long as their code can be compiled into WASM bytecode (such as Rust/golang/C++), users can use these languages to write pulsar function.
+
+## Out of Scope
+
+All existing abilities of the Java pulsar function client are not reimplemented, the WASM Pulsar functions is under the Java Pulsar functions.
+
+Due to the strict requirements of WASM on parameter types and for simplicity reasons, types other than `java.lang.Long` are not used as parameters or return value.
+
+# High Level Design
+
+```mermaid 
+flowchart LR;
+
+    subgraph develop
+        direction TB
+        SourceCode ==> |"CompileToWASM"| WasmFile ==> |"RenameFile"| MoveToTheResourceDirectory ==> UnitTest
+    end
+
+    subgraph runtime
+        direction TB
+        PulsarFunctionJava ==> |"LoadFromResource"| TheWasmFile ==> |"Invoke"| TheSourceCode
+    end
+    
+    develop --> runtime
+```
+
+# Detailed Design
+
+## Design & Implementation Details
+
+1. add `WasmLoader` to load WASM file and provide the WASM function to java, also provide the java function to WASM if we need.
+
+2. add `AbstractWasmFunction` and `AbstractWasmWindowFunction` as the core interface of the WASM function api.
+
+```java
+public abstract class AbstractWasmFunction<X, T> extends WasmLoader implements Function<X, T> {
+
+    private static final String PROCESS_METHOD_NAME = "process";
+
+    protected static final String INITIALIZE_METHOD_NAME = "initialize";
+
+    protected static final String CLOSE_METHOD_NAME = "close";
+
+    protected static final Map<Long, Argument<?>> ARGUMENTS = new ConcurrentHashMap<>();
+
+    @Override
+    public T process(X input, Context context) {
+        return super.getWasmExtern(PROCESS_METHOD_NAME)
+                .map(process -> {
+                    Long argumentId = callWASI(input, context, process);
+                    return doProcess(input, context, argumentId);
+                })
+                .orElseThrow(() -> new PulsarWasmException(
+                        PROCESS_METHOD_NAME + " function not found in " + super.getWasmName()));
+    }
+
+    private Long callWASI(X input,
+                          Context context,
+                          Extern process) {
+        // call WASI function
+        final Long argumentId = getArgumentId(input, context);
+        ARGUMENTS.put(argumentId, new Argument<>(input, context));

Review Comment:
   > Can you please elaborate more on how this handover is done?
   Java passes parameter IDs(`java.lang.Long`) to the other language, and then the other language obtain the raw parameters through the parameter ID.
   > Can WASM functions have access to Java variables? How?
   step1: export java functions to the other language
   
   ```java
   public class WasmFunction extends AbstractWasmFunction<String, String> {
   
       private static final Map<Long, String> RESULTS = new ConcurrentHashMap<>();
   
       @Override
       protected Map<String, Func> initWasmCallJavaFunc(Store<Void> store) {
           Map<String, Func> funcMap = new HashMap<>();
           funcMap.put("get_args",
                   WasmFunctions.wrap(store, WasmValType.I64, WasmValType.I64, WasmValType.I32, WasmValType.I32,
                           (argId, addr, len) -> {
                               String config = "hello from java " + argId;
                               System.out.println("java side->" + config);
                               ByteBuffer buf = super.getBuffer();
                               for (int i = 0; i < len && i < config.length(); i++) {
                                   buf.put(addr.intValue() + i, (byte) config.charAt(i));
                               }
                               return Math.min(config.length(), len);
                           }));
           //......
           return funcMap;
       }
   }
   ```
   
   step2: java call WASM, the `AbstractWasmFunction#process`
   
   step3: call to WASM, which is the following code, this is a rust(rust can be compiled into WASM) example.
   
   ```rust
   #[no_mangle]
   pub unsafe extern "C" fn process(arg_id: i64) {
       //......
   }
   ```
   
   step4: java pass args to the other language
   
   ```rust
   #[link(wasm_import_module = "pulsar")]
   extern "C" {
       fn get_args(arg_id: i64, addr: i64, len: i32) -> i32;
   }
   
   #[no_mangle]
   pub unsafe extern "C" fn process(arg_id: i64) {
       let mut buf = [0u8; 32];
       let buf_ptr = buf.as_mut_ptr() as i64;
       // get arg from java
       let len = get_args(arg_id, buf_ptr, buf.len() as i32);
       let java_arg = std::str::from_utf8(&buf[..len as usize]).unwrap();
       eprintln!("rust side-> recv:{}", java_arg);
       // ......
   }
   ```
   
   > Can the variables be of any type?
   No, see the supported types in `io.github.kawamuray.wasmtime.WasmValType`
   > How do you map a Java class to a WASM class?
   Unable to directly map, requires serialization and deserialization.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [improve][pip] PIP-336: WASM Function API [pulsar]

Posted by "loongs-zhang (via GitHub)" <gi...@apache.org>.
loongs-zhang commented on PR #21992:
URL: https://github.com/apache/pulsar/pull/21992#issuecomment-1951782053

   After the discussion is over, I will update the comments into the document.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [improve][pip] PIP-336: WASM Function API [pulsar]

Posted by "asafm (via GitHub)" <gi...@apache.org>.
asafm commented on code in PR #21992:
URL: https://github.com/apache/pulsar/pull/21992#discussion_r1495389552


##########
pip/pip-331.md:
##########
@@ -0,0 +1,129 @@
+# PIP-331: WASM Function API
+
+# Background knowledge
+
+WASM(WebAssembly) bytecode is designed to be encoded in a size- and load-time-efficient binary format. WASM aims to leverage the common hardware features available on various platforms to execute in browsers at machine code speed.
+
+WASI(WebAssembly System Interface) provide a portable interface for applications that run within a constrained sandbox environment, which allows WASM to run in non browser environments such as Linux. It's portable and secure.
+
+# Motivation
+
+The server and client sides of the Pulsar function use protobuf for decoupling. In principle, the language supported by protobuf can be supported by the pulsar function, now Pulsar provided the java, python and golang function client, but there are still many languages that are not supported.
+
+Before all language adaptations are completed (and it's almost entirely certain to be impossible), users cannot write pulsar function in their familiar languages.
+
+# Goals
+
+## In Scope
+
+Other languages, as long as their code can be compiled into WASM bytecode (such as Rust/golang/C++), users can use these languages to write pulsar function.
+
+## Out of Scope
+
+All existing abilities of the Java pulsar function client are not reimplemented, the WASM Pulsar functions is under the Java Pulsar functions.

Review Comment:
   Both repos you mentioned have ~100 stars.
   The wasmtime repo you mentioned doesn't mention Java as a supported language. Doesn't sound like a solid ground to stand upon no?
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [improve][pip] PIP-336: WASM Function API [pulsar]

Posted by "asafm (via GitHub)" <gi...@apache.org>.
asafm commented on code in PR #21992:
URL: https://github.com/apache/pulsar/pull/21992#discussion_r1495401560


##########
pip/pip-331.md:
##########
@@ -0,0 +1,129 @@
+# PIP-331: WASM Function API
+
+# Background knowledge
+
+WASM(WebAssembly) bytecode is designed to be encoded in a size- and load-time-efficient binary format. WASM aims to leverage the common hardware features available on various platforms to execute in browsers at machine code speed.
+
+WASI(WebAssembly System Interface) provide a portable interface for applications that run within a constrained sandbox environment, which allows WASM to run in non browser environments such as Linux. It's portable and secure.
+
+# Motivation
+
+The server and client sides of the Pulsar function use protobuf for decoupling. In principle, the language supported by protobuf can be supported by the pulsar function, now Pulsar provided the java, python and golang function client, but there are still many languages that are not supported.
+
+Before all language adaptations are completed (and it's almost entirely certain to be impossible), users cannot write pulsar function in their familiar languages.
+
+# Goals
+
+## In Scope
+
+Other languages, as long as their code can be compiled into WASM bytecode (such as Rust/golang/C++), users can use these languages to write pulsar function.
+
+## Out of Scope
+
+All existing abilities of the Java pulsar function client are not reimplemented, the WASM Pulsar functions is under the Java Pulsar functions.
+
+Due to the strict requirements of WASM on parameter types and for simplicity reasons, types other than `java.lang.Long` are not used as parameters or return value.
+
+# High Level Design
+
+```mermaid 
+flowchart LR;
+
+    subgraph develop
+        direction TB
+        SourceCode ==> |"CompileToWASM"| WasmFile ==> |"RenameFile"| MoveToTheResourceDirectory ==> UnitTest
+    end
+
+    subgraph runtime
+        direction TB
+        PulsarFunctionJava ==> |"LoadFromResource"| TheWasmFile ==> |"Invoke"| TheSourceCode
+    end
+    
+    develop --> runtime
+```
+
+# Detailed Design
+
+## Design & Implementation Details
+
+1. add `WasmLoader` to load WASM file and provide the WASM function to java, also provide the java function to WASM if we need.
+
+2. add `AbstractWasmFunction` and `AbstractWasmWindowFunction` as the core interface of the WASM function api.
+
+```java
+public abstract class AbstractWasmFunction<X, T> extends WasmLoader implements Function<X, T> {

Review Comment:
   Well Go have no inheritence - and I'm pretty sure they have a similar need.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [improve][pip] PIP-336: WASM Function API [pulsar]

Posted by "loongs-zhang (via GitHub)" <gi...@apache.org>.
loongs-zhang commented on code in PR #21992:
URL: https://github.com/apache/pulsar/pull/21992#discussion_r1493959608


##########
pip/pip-331.md:
##########
@@ -0,0 +1,129 @@
+# PIP-331: WASM Function API
+
+# Background knowledge
+
+WASM(WebAssembly) bytecode is designed to be encoded in a size- and load-time-efficient binary format. WASM aims to leverage the common hardware features available on various platforms to execute in browsers at machine code speed.
+
+WASI(WebAssembly System Interface) provide a portable interface for applications that run within a constrained sandbox environment, which allows WASM to run in non browser environments such as Linux. It's portable and secure.
+
+# Motivation
+
+The server and client sides of the Pulsar function use protobuf for decoupling. In principle, the language supported by protobuf can be supported by the pulsar function, now Pulsar provided the java, python and golang function client, but there are still many languages that are not supported.
+
+Before all language adaptations are completed (and it's almost entirely certain to be impossible), users cannot write pulsar function in their familiar languages.
+
+# Goals
+
+## In Scope
+
+Other languages, as long as their code can be compiled into WASM bytecode (such as Rust/golang/C++), users can use these languages to write pulsar function.
+
+## Out of Scope
+
+All existing abilities of the Java pulsar function client are not reimplemented, the WASM Pulsar functions is under the Java Pulsar functions.
+
+Due to the strict requirements of WASM on parameter types and for simplicity reasons, types other than `java.lang.Long` are not used as parameters or return value.

Review Comment:
   In brief, Java passes parameter IDs(`java.lang.Long`) to the other language, and then the other language obtain the raw parameters through the parameter ID, the raw parameters can be topics, messages, and so on.
   
   The more details in https://github.com/apache/pulsar/pull/21975#issuecomment-1928993883



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [improve][pip] PIP-336: WASM Support for pulsar-function-java [pulsar]

Posted by "lburgazzoli (via GitHub)" <gi...@apache.org>.
lburgazzoli commented on PR #21992:
URL: https://github.com/apache/pulsar/pull/21992#issuecomment-1988605748

   > > I'm quite new to Pulsar, but reading through the mailing list, is seems that It would also probably help to have a optimized support for the function type which for WASM would probably be very useful.
   > 
   > @lburgazzoli I agree that support for WASM would be very useful. However, the current design of embedding inside the Java runtime doesn't seem very efficient since the JVM will add it's overhead. With a pluggable runtime type, it would be fine to have such implementations on the way to a more optimized solution. That's one more reason to have pluggable runtime types since it lowers the barriers to add new types or improve some what would be available.
   
   I may have expressed myself not clearly but I'm 100% in agreement with this :) 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [improve][pip] PIP-336: WASM Support for pulsar-function-java [pulsar]

Posted by "lhotari (via GitHub)" <gi...@apache.org>.
lhotari commented on PR #21992:
URL: https://github.com/apache/pulsar/pull/21992#issuecomment-1988593929

   > I'm quite new to Pulsar, but reading through the mailing list, is seems that It would also probably help to have a optimized support for the function type which for WASM would probably be very useful.
   
   @lburgazzoli I agree that support for WASM would be very useful. However, the current design of embedding inside the Java runtime doesn't seem very efficient since the JVM will add it's overhead. With a pluggable runtime type, it would be fine to have such implementations on the way to a more optimized solution. That's one more reason to have pluggable runtime types since it lowers the barriers to add new types or improve some what would be available.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [improve][pip] PIP-336: WASM Function API [pulsar]

Posted by "loongs-zhang (via GitHub)" <gi...@apache.org>.
loongs-zhang commented on code in PR #21992:
URL: https://github.com/apache/pulsar/pull/21992#discussion_r1516161258


##########
pip/pip-331.md:
##########
@@ -0,0 +1,129 @@
+# PIP-331: WASM Function API
+
+# Background knowledge
+
+WASM(WebAssembly) bytecode is designed to be encoded in a size- and load-time-efficient binary format. WASM aims to leverage the common hardware features available on various platforms to execute in browsers at machine code speed.
+
+WASI(WebAssembly System Interface) provide a portable interface for applications that run within a constrained sandbox environment, which allows WASM to run in non browser environments such as Linux. It's portable and secure.
+
+# Motivation
+
+The server and client sides of the Pulsar function use protobuf for decoupling. In principle, the language supported by protobuf can be supported by the pulsar function, now Pulsar provided the java, python and golang function client, but there are still many languages that are not supported.
+
+Before all language adaptations are completed (and it's almost entirely certain to be impossible), users cannot write pulsar function in their familiar languages.
+
+# Goals
+
+## In Scope
+
+Other languages, as long as their code can be compiled into WASM bytecode (such as Rust/golang/C++), users can use these languages to write pulsar function.
+
+## Out of Scope
+
+All existing abilities of the Java pulsar function client are not reimplemented, the WASM Pulsar functions is under the Java Pulsar functions.
+
+Due to the strict requirements of WASM on parameter types and for simplicity reasons, types other than `java.lang.Long` are not used as parameters or return value.
+
+# High Level Design
+
+```mermaid 
+flowchart LR;
+
+    subgraph develop
+        direction TB
+        SourceCode ==> |"CompileToWASM"| WasmFile ==> |"RenameFile"| MoveToTheResourceDirectory ==> UnitTest
+    end
+
+    subgraph runtime
+        direction TB
+        PulsarFunctionJava ==> |"LoadFromResource"| TheWasmFile ==> |"Invoke"| TheSourceCode
+    end
+    
+    develop --> runtime
+```
+
+# Detailed Design
+
+## Design & Implementation Details
+
+1. add `WasmLoader` to load WASM file and provide the WASM function to java, also provide the java function to WASM if we need.
+
+2. add `AbstractWasmFunction` and `AbstractWasmWindowFunction` as the core interface of the WASM function api.
+
+```java
+public abstract class AbstractWasmFunction<X, T> extends WasmLoader implements Function<X, T> {
+
+    private static final String PROCESS_METHOD_NAME = "process";
+
+    protected static final String INITIALIZE_METHOD_NAME = "initialize";
+
+    protected static final String CLOSE_METHOD_NAME = "close";
+
+    protected static final Map<Long, Argument<?>> ARGUMENTS = new ConcurrentHashMap<>();
+
+    @Override
+    public T process(X input, Context context) {
+        return super.getWasmExtern(PROCESS_METHOD_NAME)
+                .map(process -> {
+                    Long argumentId = callWASI(input, context, process);
+                    return doProcess(input, context, argumentId);
+                })
+                .orElseThrow(() -> new PulsarWasmException(
+                        PROCESS_METHOD_NAME + " function not found in " + super.getWasmName()));
+    }
+
+    private Long callWASI(X input,
+                          Context context,
+                          Extern process) {
+        // call WASI function
+        final Long argumentId = getArgumentId(input, context);
+        ARGUMENTS.put(argumentId, new Argument<>(input, context));

Review Comment:
   Well, WASM can easily read a `byte[]` from other languages, but it cannot directly convert it into a Java object, which is far inferior to JNI, but its cross platform aspect is much worse than WASM(Imagine, the user use JNI and C to cross platform, and when users switched to Golang, they had to do it again). Any suggestions?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [improve][pip] PIP-336: WASM Support for pulsar-function-java [pulsar]

Posted by "loongs-zhang (via GitHub)" <gi...@apache.org>.
loongs-zhang commented on PR #21992:
URL: https://github.com/apache/pulsar/pull/21992#issuecomment-1988628528

   > > I'd like the SDK way, remember [#21975 (comment)](https://github.com/apache/pulsar/pull/21975#issuecomment-1933278909) ? The `WasmLoader` interface is the SPI.
   > 
   > @loongs-zhang Instead of a WASM specific SPI in Pulsar Functions, I'd rather see full decoupling of all Pulsar Function runtime types in similar manner where there could be a SPI which makes runtime times pluggable in Pulsar Functions.
   
   I also agree with this point.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [improve][pip] PIP-336: WASM Support for pulsar-function-java [pulsar]

Posted by "lhotari (via GitHub)" <gi...@apache.org>.
lhotari commented on PR #21992:
URL: https://github.com/apache/pulsar/pull/21992#issuecomment-1988617942

   > I'd like the SDK way, remember [#21975 (comment)](https://github.com/apache/pulsar/pull/21975#issuecomment-1933278909) ? The `WasmLoader` interface is the SPI.
   
   @loongs-zhang Instead of a WASM specific SPI in Pulsar Functions, I'd rather see full decoupling of all Pulsar Function runtime types in similar manner where there could be a SPI which makes runtime times pluggable. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [improve][pip] PIP-336: WASM Support for pulsar-function-java [pulsar]

Posted by "lburgazzoli (via GitHub)" <gi...@apache.org>.
lburgazzoli commented on PR #21992:
URL: https://github.com/apache/pulsar/pull/21992#issuecomment-1988302919

   > @lburgazzoli Brilliant. Can you explain how the function access the data at a `Message` object in the WASM function - regardless of the language it is written in? 
   
   At this stage the function expects that the payload/value is a `byte[]` and does not perform any check if the actual byte array is something that the guest function can process. It is up to the function to perform any validation if needed.
   
   The assumption is that, if you want to use something like WASM for transformation, you are probably dealing with a payload type that can be re-constructed in the target language i.e. it is fairly simple if the data format is JSON, YAML, et,  or you are not interested in the payload as the function only acts on the `Message` metadata. 
   
   I guess this is pretty similar to how a gRPC alike Pulsar Function would work.
   
   So the signature of the function is:
   
   ```java
   public Record<byte[]> process(byte[] input, Context context) throws Exception {
       // impl here
   }
   ```
   
   Then to access the individual part of the `Message`, the host registers a number of functions, like
   
   ```java
   wrap(
       this::getValueFn,
       "pulsar_get_value",
       List.of(),
       List.of(ValueType.I64)),
   wrap(
       this::setValueFn,
       "pulsar_set_value",
       List.of(ValueType.I32, ValueType.I32),
       List.of()),
   ```
   
   For which the related implementation is like:
   
   ```java
       private Value[] getValueFn(Instance instance, Value... args) {
           final byte[] rawData = this.ref.get().value();
   
           return new Value[] {
                   write(rawData)
           };
       }
   
       private Value[] setValueFn(Instance instance, Value... args) {
           final int addr = args[0].asInt();
           final int size = args[1].asInt();
           final byte[] value = instance.memory().readBytes(addr, size);
   
           this.ref.get().value(value);
   
           return new Value[] {};
   ```   
   
   Since the core WASM spec does not support threads, the function implementation adda a lock and essentially access to the current `Message` with a `ThreadLocal` alike implementation.
   
   The guest then access the functions exposed by the host like:
    
   ![image](https://github.com/apache/pulsar/assets/1868933/3927ce7c-32d3-405c-a478-c9159c8404cc)
   
   An example of a function written in rust is then:
   
   ```rust
   #[cfg_attr(all(target_arch = "wasm32"), export_name = "to_upper")]
   #[no_mangle]
   pub extern fn to_upper() {
       let val = get_record_value();
       let res = String::from_utf8(val).unwrap().to_uppercase().as_bytes().to_vec();
   
       set_record_value(res);
   }
   ```
   
   > Can you also explain how does this differ from Component Model?
   
   The difference from a component model is that, there is quite some work that must be done on the implementation side so as an example, there is a sort of [SDK](https://github.com/lburgazzoli/pulsar-function-wasm/tree/main/src/main/rust) that helps writing functions, or the developer has to implement its own ABI that matches the expectations  of the host. The component model would free the developer from that aspect but under the hoods, the generated code would probably do something similar since you should always pass through the linear memory for host/guest calls.
   
   >  Is it reasonable to just wait for Component Model to be implemented in a runtime that can from Java instead of supporting two flavors?
   
   At this stage I fear that waiting for the component model to become mainstream could delay quite a lot the adoption of WASM and it may put a lot of restriction to what runtimes a developer can use. However I really hope it will emerge as soon as possible but given all the things happened with WASI, it may take time (hope to be wrong on this)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [improve][pip] PIP-336: WASM Support for pulsar-function-java [pulsar]

Posted by "loongs-zhang (via GitHub)" <gi...@apache.org>.
loongs-zhang commented on code in PR #21992:
URL: https://github.com/apache/pulsar/pull/21992#discussion_r1519802390


##########
pip/pip-331.md:
##########
@@ -0,0 +1,129 @@
+# PIP-331: WASM Function API
+
+# Background knowledge
+
+WASM(WebAssembly) bytecode is designed to be encoded in a size- and load-time-efficient binary format. WASM aims to leverage the common hardware features available on various platforms to execute in browsers at machine code speed.
+
+WASI(WebAssembly System Interface) provide a portable interface for applications that run within a constrained sandbox environment, which allows WASM to run in non browser environments such as Linux. It's portable and secure.
+
+# Motivation
+
+The server and client sides of the Pulsar function use protobuf for decoupling. In principle, the language supported by protobuf can be supported by the pulsar function, now Pulsar provided the java, python and golang function client, but there are still many languages that are not supported.
+
+Before all language adaptations are completed (and it's almost entirely certain to be impossible), users cannot write pulsar function in their familiar languages.
+
+# Goals
+
+## In Scope
+
+Other languages, as long as their code can be compiled into WASM bytecode (such as Rust/golang/C++), users can use these languages to write pulsar function.
+
+## Out of Scope
+
+All existing abilities of the Java pulsar function client are not reimplemented, the WASM Pulsar functions is under the Java Pulsar functions.
+
+Due to the strict requirements of WASM on parameter types and for simplicity reasons, types other than `java.lang.Long` are not used as parameters or return value.
+
+# High Level Design
+
+```mermaid 
+flowchart LR;
+
+    subgraph develop
+        direction TB
+        SourceCode ==> |"CompileToWASM"| WasmFile ==> |"RenameFile"| MoveToTheResourceDirectory ==> UnitTest
+    end
+
+    subgraph runtime
+        direction TB
+        PulsarFunctionJava ==> |"LoadFromResource"| TheWasmFile ==> |"Invoke"| TheSourceCode
+    end
+    
+    develop --> runtime
+```
+
+# Detailed Design
+
+## Design & Implementation Details
+
+1. add `WasmLoader` to load WASM file and provide the WASM function to java, also provide the java function to WASM if we need.
+
+2. add `AbstractWasmFunction` and `AbstractWasmWindowFunction` as the core interface of the WASM function api.
+
+```java
+public abstract class AbstractWasmFunction<X, T> extends WasmLoader implements Function<X, T> {
+
+    private static final String PROCESS_METHOD_NAME = "process";
+
+    protected static final String INITIALIZE_METHOD_NAME = "initialize";
+
+    protected static final String CLOSE_METHOD_NAME = "close";
+
+    protected static final Map<Long, Argument<?>> ARGUMENTS = new ConcurrentHashMap<>();
+
+    @Override
+    public T process(X input, Context context) {
+        return super.getWasmExtern(PROCESS_METHOD_NAME)
+                .map(process -> {
+                    Long argumentId = callWASI(input, context, process);
+                    return doProcess(input, context, argumentId);
+                })
+                .orElseThrow(() -> new PulsarWasmException(
+                        PROCESS_METHOD_NAME + " function not found in " + super.getWasmName()));
+    }
+
+    private Long callWASI(X input,
+                          Context context,
+                          Extern process) {
+        // call WASI function
+        final Long argumentId = getArgumentId(input, context);
+        ARGUMENTS.put(argumentId, new Argument<>(input, context));

Review Comment:
   > Do you have any idea what is the performance implications for this step? Comparing normal java function throughput compared with this WASM function?
   
   I know this will bring a significant performance drop, but besides serialization/deserialization, I can't think of any way to convert byte[] into a Java object.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [improve][pip] PIP-336: WASM Support for pulsar-function-java [pulsar]

Posted by "loongs-zhang (via GitHub)" <gi...@apache.org>.
loongs-zhang commented on PR #21992:
URL: https://github.com/apache/pulsar/pull/21992#issuecomment-1988551611

   > FWIW
   > 
   > I've been working on some POC to enable WASM function in Kafka Connect, Pulsar Functions and Apache Camel, where the focus was mainly to prove that [_Chicory_](https://github.com/dylibso/chicory), a pure Java WASM runtime, was good enough to be leveraged by Kafka Connect, Pulsar Functions, Camel:
   > 
   > * https://lburgazzoli.github.io/posts/2024-02-01_apache_kafka_connect_meets_wasm_part_1/
   > * https://github.com/lburgazzoli/kafka-connect-wasm-transformer
   > * https://github.com/lburgazzoli/pulsar-function-wasm
   > * https://lburgazzoli.github.io/posts/2024-01-14_apache_camel_meets_wasm_part_1/
   > 
   > Because the focus was on _Chicory_, a number of assumption have been made and no performances tests have been executed.
   > 
   > The different POCs have also different implementations, as an example, within the the Apache Camel the wole message in transit is serialize/deserialize to/from the WASM guest function whereas in the Kafka Connect and Pulsar POCs, the developer can leverage a number of exported functions in order to access individual part of the native envelop (Message, Record, etc) to reduce serdes costs.
   > 
   > I did explore the [component-model](https://component-model.bytecodealliance.org/introduction.html) and I think it would become the right thing to use, but since it was not supported by _Chicory_ and also not widely supported across the WASM runtimes I know, I opted for a conservative approach where the only requirement is to support the WASM spec (WASI is not even required). The side effect is that, an SDK would be really appreciated to reduce the boilerplate developers have to write.
   
   I'd like the SDK way, remember https://github.com/apache/pulsar/pull/21975#issuecomment-1933278909 ? The `WasmLoader` interface is the SPI.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [improve][pip] PIP-336: WASM Function API [pulsar]

Posted by "asafm (via GitHub)" <gi...@apache.org>.
asafm commented on code in PR #21992:
URL: https://github.com/apache/pulsar/pull/21992#discussion_r1493771613


##########
pip/pip-331.md:
##########
@@ -0,0 +1,129 @@
+# PIP-331: WASM Function API
+
+# Background knowledge
+
+WASM(WebAssembly) bytecode is designed to be encoded in a size- and load-time-efficient binary format. WASM aims to leverage the common hardware features available on various platforms to execute in browsers at machine code speed.
+
+WASI(WebAssembly System Interface) provide a portable interface for applications that run within a constrained sandbox environment, which allows WASM to run in non browser environments such as Linux. It's portable and secure.
+
+# Motivation
+
+The server and client sides of the Pulsar function use protobuf for decoupling. In principle, the language supported by protobuf can be supported by the pulsar function, now Pulsar provided the java, python and golang function client, but there are still many languages that are not supported.

Review Comment:
   "provided the java, python and golang function client." - Do you mean function runtime?
   



##########
pip/pip-331.md:
##########
@@ -0,0 +1,129 @@
+# PIP-331: WASM Function API
+
+# Background knowledge
+
+WASM(WebAssembly) bytecode is designed to be encoded in a size- and load-time-efficient binary format. WASM aims to leverage the common hardware features available on various platforms to execute in browsers at machine code speed.
+
+WASI(WebAssembly System Interface) provide a portable interface for applications that run within a constrained sandbox environment, which allows WASM to run in non browser environments such as Linux. It's portable and secure.
+
+# Motivation
+
+The server and client sides of the Pulsar function use protobuf for decoupling. In principle, the language supported by protobuf can be supported by the pulsar function, now Pulsar provided the java, python and golang function client, but there are still many languages that are not supported.
+
+Before all language adaptations are completed (and it's almost entirely certain to be impossible), users cannot write pulsar function in their familiar languages.
+
+# Goals
+
+## In Scope
+
+Other languages, as long as their code can be compiled into WASM bytecode (such as Rust/golang/C++), users can use these languages to write pulsar function.
+
+## Out of Scope
+
+All existing abilities of the Java pulsar function client are not reimplemented, the WASM Pulsar functions is under the Java Pulsar functions.
+
+Due to the strict requirements of WASM on parameter types and for simplicity reasons, types other than `java.lang.Long` are not used as parameters or return value.
+
+# High Level Design
+
+```mermaid 
+flowchart LR;
+
+    subgraph develop
+        direction TB
+        SourceCode ==> |"CompileToWASM"| WasmFile ==> |"RenameFile"| MoveToTheResourceDirectory ==> UnitTest
+    end
+
+    subgraph runtime
+        direction TB
+        PulsarFunctionJava ==> |"LoadFromResource"| TheWasmFile ==> |"Invoke"| TheSourceCode
+    end
+    
+    develop --> runtime
+```
+
+# Detailed Design
+
+## Design & Implementation Details
+
+1. add `WasmLoader` to load WASM file and provide the WASM function to java, also provide the java function to WASM if we need.
+
+2. add `AbstractWasmFunction` and `AbstractWasmWindowFunction` as the core interface of the WASM function api.
+
+```java
+public abstract class AbstractWasmFunction<X, T> extends WasmLoader implements Function<X, T> {
+
+    private static final String PROCESS_METHOD_NAME = "process";
+
+    protected static final String INITIALIZE_METHOD_NAME = "initialize";
+
+    protected static final String CLOSE_METHOD_NAME = "close";
+
+    protected static final Map<Long, Argument<?>> ARGUMENTS = new ConcurrentHashMap<>();
+
+    @Override
+    public T process(X input, Context context) {
+        return super.getWasmExtern(PROCESS_METHOD_NAME)
+                .map(process -> {
+                    Long argumentId = callWASI(input, context, process);
+                    return doProcess(input, context, argumentId);
+                })
+                .orElseThrow(() -> new PulsarWasmException(
+                        PROCESS_METHOD_NAME + " function not found in " + super.getWasmName()));
+    }
+
+    private Long callWASI(X input,
+                          Context context,
+                          Extern process) {
+        // call WASI function
+        final Long argumentId = getArgumentId(input, context);
+        ARGUMENTS.put(argumentId, new Argument<>(input, context));
+        // WASI cannot easily pass Java objects like JNI, here we pass Long
+        // then we can get the argument by Long
+        WasmFunctions.consumer(super.getStore(), process.func(), WasmValType.I64)
+                .accept(argumentId);
+        ARGUMENTS.remove(argumentId);
+        return argumentId;
+    }
+
+    protected abstract T doProcess(X input, Context context, Long argumentId);
+
+    protected abstract Long getArgumentId(X input, Context context);
+
+    @Override
+    public void initialize(Context context) {
+        super.getWasmExtern(INITIALIZE_METHOD_NAME)
+                .ifPresent(initialize -> callWASI(null, context, initialize));
+    }
+
+    @Override
+    public void close() {
+        super.getWasmExtern(CLOSE_METHOD_NAME)
+                .ifPresent(close -> callWASI(null, null, close));
+        super.close();
+    }
+
+    protected static class Argument<X> {
+        protected X input;
+        protected Context context;
+
+        private Argument(X input, Context context) {
+            this.input = input;
+            this.context = context;
+        }
+    }
+}
+```
+
+More detailed code implementation and test can be found in [here](https://github.com/apache/pulsar/pull/21975)
+
+# Security Considerations
+
+Maybe need to add folders with tenancy name in the resource directory to prevent conflicts between WASM file names of different tenancies.

Review Comment:
   What is the resource directory? Is it shared today by Pulsar functions?
   



##########
pip/pip-331.md:
##########
@@ -0,0 +1,129 @@
+# PIP-331: WASM Function API
+
+# Background knowledge
+
+WASM(WebAssembly) bytecode is designed to be encoded in a size- and load-time-efficient binary format. WASM aims to leverage the common hardware features available on various platforms to execute in browsers at machine code speed.
+
+WASI(WebAssembly System Interface) provide a portable interface for applications that run within a constrained sandbox environment, which allows WASM to run in non browser environments such as Linux. It's portable and secure.
+
+# Motivation
+
+The server and client sides of the Pulsar function use protobuf for decoupling. In principle, the language supported by protobuf can be supported by the pulsar function, now Pulsar provided the java, python and golang function client, but there are still many languages that are not supported.
+
+Before all language adaptations are completed (and it's almost entirely certain to be impossible), users cannot write pulsar function in their familiar languages.
+
+# Goals
+
+## In Scope
+
+Other languages, as long as their code can be compiled into WASM bytecode (such as Rust/golang/C++), users can use these languages to write pulsar function.
+
+## Out of Scope
+
+All existing abilities of the Java pulsar function client are not reimplemented, the WASM Pulsar functions is under the Java Pulsar functions.
+
+Due to the strict requirements of WASM on parameter types and for simplicity reasons, types other than `java.lang.Long` are not used as parameters or return value.
+
+# High Level Design
+
+```mermaid 
+flowchart LR;
+
+    subgraph develop
+        direction TB
+        SourceCode ==> |"CompileToWASM"| WasmFile ==> |"RenameFile"| MoveToTheResourceDirectory ==> UnitTest
+    end
+
+    subgraph runtime
+        direction TB
+        PulsarFunctionJava ==> |"LoadFromResource"| TheWasmFile ==> |"Invoke"| TheSourceCode
+    end
+    
+    develop --> runtime
+```
+
+# Detailed Design
+
+## Design & Implementation Details
+
+1. add `WasmLoader` to load WASM file and provide the WASM function to java, also provide the java function to WASM if we need.

Review Comment:
   The assumption here is that you plan to use an existing WASM Java library that can load and invoke a WASM function?
   



##########
pip/pip-331.md:
##########
@@ -0,0 +1,129 @@
+# PIP-331: WASM Function API
+
+# Background knowledge
+
+WASM(WebAssembly) bytecode is designed to be encoded in a size- and load-time-efficient binary format. WASM aims to leverage the common hardware features available on various platforms to execute in browsers at machine code speed.
+
+WASI(WebAssembly System Interface) provide a portable interface for applications that run within a constrained sandbox environment, which allows WASM to run in non browser environments such as Linux. It's portable and secure.
+
+# Motivation
+
+The server and client sides of the Pulsar function use protobuf for decoupling. In principle, the language supported by protobuf can be supported by the pulsar function, now Pulsar provided the java, python and golang function client, but there are still many languages that are not supported.
+
+Before all language adaptations are completed (and it's almost entirely certain to be impossible), users cannot write pulsar function in their familiar languages.
+
+# Goals
+
+## In Scope
+
+Other languages, as long as their code can be compiled into WASM bytecode (such as Rust/golang/C++), users can use these languages to write pulsar function.
+
+## Out of Scope
+
+All existing abilities of the Java pulsar function client are not reimplemented, the WASM Pulsar functions is under the Java Pulsar functions.
+
+Due to the strict requirements of WASM on parameter types and for simplicity reasons, types other than `java.lang.Long` are not used as parameters or return value.

Review Comment:
   Parameters of the function? I thought Pulsar functions accept a message as a parameter, and that's it, no? Can you provide some context on the parameters of a function?



##########
pip/pip-331.md:
##########
@@ -0,0 +1,129 @@
+# PIP-331: WASM Function API
+
+# Background knowledge
+
+WASM(WebAssembly) bytecode is designed to be encoded in a size- and load-time-efficient binary format. WASM aims to leverage the common hardware features available on various platforms to execute in browsers at machine code speed.
+
+WASI(WebAssembly System Interface) provide a portable interface for applications that run within a constrained sandbox environment, which allows WASM to run in non browser environments such as Linux. It's portable and secure.
+
+# Motivation
+
+The server and client sides of the Pulsar function use protobuf for decoupling. In principle, the language supported by protobuf can be supported by the pulsar function, now Pulsar provided the java, python and golang function client, but there are still many languages that are not supported.
+
+Before all language adaptations are completed (and it's almost entirely certain to be impossible), users cannot write pulsar function in their familiar languages.
+
+# Goals
+
+## In Scope
+
+Other languages, as long as their code can be compiled into WASM bytecode (such as Rust/golang/C++), users can use these languages to write pulsar function.
+
+## Out of Scope
+
+All existing abilities of the Java pulsar function client are not reimplemented, the WASM Pulsar functions is under the Java Pulsar functions.
+
+Due to the strict requirements of WASM on parameter types and for simplicity reasons, types other than `java.lang.Long` are not used as parameters or return value.
+
+# High Level Design

Review Comment:
   It would be very useful if you could explain how these diagram stages work and provide context. 
   1. Do we provide a template Maven / Gradle project that contains the build plugin, which compiles to a WASM file?
   2. Why must we move to a resource directory to run the unit test? Does the build do it for us? Do we intend to provide a Unit test framework for the users to execute or author the tests?
   3. Can you expand on how the "LoadFromResource" works? Do you intend to use an open-source Java library that does that? Which library is it? 



##########
pip/pip-331.md:
##########
@@ -0,0 +1,129 @@
+# PIP-331: WASM Function API
+
+# Background knowledge
+
+WASM(WebAssembly) bytecode is designed to be encoded in a size- and load-time-efficient binary format. WASM aims to leverage the common hardware features available on various platforms to execute in browsers at machine code speed.
+
+WASI(WebAssembly System Interface) provide a portable interface for applications that run within a constrained sandbox environment, which allows WASM to run in non browser environments such as Linux. It's portable and secure.
+
+# Motivation
+
+The server and client sides of the Pulsar function use protobuf for decoupling. In principle, the language supported by protobuf can be supported by the pulsar function, now Pulsar provided the java, python and golang function client, but there are still many languages that are not supported.
+
+Before all language adaptations are completed (and it's almost entirely certain to be impossible), users cannot write pulsar function in their familiar languages.
+
+# Goals
+
+## In Scope
+
+Other languages, as long as their code can be compiled into WASM bytecode (such as Rust/golang/C++), users can use these languages to write pulsar function.
+
+## Out of Scope
+
+All existing abilities of the Java pulsar function client are not reimplemented, the WASM Pulsar functions is under the Java Pulsar functions.
+
+Due to the strict requirements of WASM on parameter types and for simplicity reasons, types other than `java.lang.Long` are not used as parameters or return value.
+
+# High Level Design
+
+```mermaid 
+flowchart LR;
+
+    subgraph develop
+        direction TB
+        SourceCode ==> |"CompileToWASM"| WasmFile ==> |"RenameFile"| MoveToTheResourceDirectory ==> UnitTest
+    end
+
+    subgraph runtime
+        direction TB
+        PulsarFunctionJava ==> |"LoadFromResource"| TheWasmFile ==> |"Invoke"| TheSourceCode
+    end
+    
+    develop --> runtime
+```
+
+# Detailed Design
+
+## Design & Implementation Details
+
+1. add `WasmLoader` to load WASM file and provide the WASM function to java, also provide the java function to WASM if we need.
+
+2. add `AbstractWasmFunction` and `AbstractWasmWindowFunction` as the core interface of the WASM function api.
+
+```java
+public abstract class AbstractWasmFunction<X, T> extends WasmLoader implements Function<X, T> {
+
+    private static final String PROCESS_METHOD_NAME = "process";
+
+    protected static final String INITIALIZE_METHOD_NAME = "initialize";
+
+    protected static final String CLOSE_METHOD_NAME = "close";
+
+    protected static final Map<Long, Argument<?>> ARGUMENTS = new ConcurrentHashMap<>();
+
+    @Override
+    public T process(X input, Context context) {
+        return super.getWasmExtern(PROCESS_METHOD_NAME)
+                .map(process -> {
+                    Long argumentId = callWASI(input, context, process);

Review Comment:
   What exactly happens here? Does it load a virtual machine first that can execute WASM code? 



##########
pip/pip-331.md:
##########
@@ -0,0 +1,129 @@
+# PIP-331: WASM Function API
+
+# Background knowledge
+
+WASM(WebAssembly) bytecode is designed to be encoded in a size- and load-time-efficient binary format. WASM aims to leverage the common hardware features available on various platforms to execute in browsers at machine code speed.
+
+WASI(WebAssembly System Interface) provide a portable interface for applications that run within a constrained sandbox environment, which allows WASM to run in non browser environments such as Linux. It's portable and secure.
+
+# Motivation
+
+The server and client sides of the Pulsar function use protobuf for decoupling. In principle, the language supported by protobuf can be supported by the pulsar function, now Pulsar provided the java, python and golang function client, but there are still many languages that are not supported.
+
+Before all language adaptations are completed (and it's almost entirely certain to be impossible), users cannot write pulsar function in their familiar languages.
+
+# Goals
+
+## In Scope
+
+Other languages, as long as their code can be compiled into WASM bytecode (such as Rust/golang/C++), users can use these languages to write pulsar function.
+
+## Out of Scope
+
+All existing abilities of the Java pulsar function client are not reimplemented, the WASM Pulsar functions is under the Java Pulsar functions.
+
+Due to the strict requirements of WASM on parameter types and for simplicity reasons, types other than `java.lang.Long` are not used as parameters or return value.
+
+# High Level Design
+
+```mermaid 
+flowchart LR;
+
+    subgraph develop
+        direction TB
+        SourceCode ==> |"CompileToWASM"| WasmFile ==> |"RenameFile"| MoveToTheResourceDirectory ==> UnitTest
+    end
+
+    subgraph runtime
+        direction TB
+        PulsarFunctionJava ==> |"LoadFromResource"| TheWasmFile ==> |"Invoke"| TheSourceCode
+    end
+    
+    develop --> runtime
+```
+
+# Detailed Design
+
+## Design & Implementation Details
+
+1. add `WasmLoader` to load WASM file and provide the WASM function to java, also provide the java function to WASM if we need.
+
+2. add `AbstractWasmFunction` and `AbstractWasmWindowFunction` as the core interface of the WASM function api.
+
+```java
+public abstract class AbstractWasmFunction<X, T> extends WasmLoader implements Function<X, T> {
+
+    private static final String PROCESS_METHOD_NAME = "process";
+
+    protected static final String INITIALIZE_METHOD_NAME = "initialize";
+
+    protected static final String CLOSE_METHOD_NAME = "close";
+
+    protected static final Map<Long, Argument<?>> ARGUMENTS = new ConcurrentHashMap<>();
+
+    @Override
+    public T process(X input, Context context) {
+        return super.getWasmExtern(PROCESS_METHOD_NAME)
+                .map(process -> {
+                    Long argumentId = callWASI(input, context, process);
+                    return doProcess(input, context, argumentId);
+                })
+                .orElseThrow(() -> new PulsarWasmException(
+                        PROCESS_METHOD_NAME + " function not found in " + super.getWasmName()));
+    }
+
+    private Long callWASI(X input,
+                          Context context,
+                          Extern process) {
+        // call WASI function
+        final Long argumentId = getArgumentId(input, context);
+        ARGUMENTS.put(argumentId, new Argument<>(input, context));

Review Comment:
   Can you please elaborate more on how this handover is done?
   Can WASM functions have access to Java variables? How?
   Can the variables be of any type? How do you map a Java class to a WASM class?
   



##########
pip/pip-331.md:
##########
@@ -0,0 +1,129 @@
+# PIP-331: WASM Function API
+
+# Background knowledge
+
+WASM(WebAssembly) bytecode is designed to be encoded in a size- and load-time-efficient binary format. WASM aims to leverage the common hardware features available on various platforms to execute in browsers at machine code speed.
+
+WASI(WebAssembly System Interface) provide a portable interface for applications that run within a constrained sandbox environment, which allows WASM to run in non browser environments such as Linux. It's portable and secure.
+
+# Motivation
+
+The server and client sides of the Pulsar function use protobuf for decoupling. In principle, the language supported by protobuf can be supported by the pulsar function, now Pulsar provided the java, python and golang function client, but there are still many languages that are not supported.
+
+Before all language adaptations are completed (and it's almost entirely certain to be impossible), users cannot write pulsar function in their familiar languages.
+
+# Goals
+
+## In Scope
+
+Other languages, as long as their code can be compiled into WASM bytecode (such as Rust/golang/C++), users can use these languages to write pulsar function.
+
+## Out of Scope
+
+All existing abilities of the Java pulsar function client are not reimplemented, the WASM Pulsar functions is under the Java Pulsar functions.

Review Comment:
   Can you expand on the last part of the sentence? What does it mean "the WASM Pulsar functions is under the Java Pulsar functions."?



##########
pip/pip-331.md:
##########
@@ -0,0 +1,129 @@
+# PIP-331: WASM Function API
+
+# Background knowledge
+
+WASM(WebAssembly) bytecode is designed to be encoded in a size- and load-time-efficient binary format. WASM aims to leverage the common hardware features available on various platforms to execute in browsers at machine code speed.
+
+WASI(WebAssembly System Interface) provide a portable interface for applications that run within a constrained sandbox environment, which allows WASM to run in non browser environments such as Linux. It's portable and secure.
+
+# Motivation
+
+The server and client sides of the Pulsar function use protobuf for decoupling. In principle, the language supported by protobuf can be supported by the pulsar function, now Pulsar provided the java, python and golang function client, but there are still many languages that are not supported.
+
+Before all language adaptations are completed (and it's almost entirely certain to be impossible), users cannot write pulsar function in their familiar languages.
+
+# Goals
+
+## In Scope
+
+Other languages, as long as their code can be compiled into WASM bytecode (such as Rust/golang/C++), users can use these languages to write pulsar function.
+
+## Out of Scope
+
+All existing abilities of the Java pulsar function client are not reimplemented, the WASM Pulsar functions is under the Java Pulsar functions.
+
+Due to the strict requirements of WASM on parameter types and for simplicity reasons, types other than `java.lang.Long` are not used as parameters or return value.
+
+# High Level Design
+
+```mermaid 
+flowchart LR;
+
+    subgraph develop
+        direction TB
+        SourceCode ==> |"CompileToWASM"| WasmFile ==> |"RenameFile"| MoveToTheResourceDirectory ==> UnitTest
+    end
+
+    subgraph runtime
+        direction TB
+        PulsarFunctionJava ==> |"LoadFromResource"| TheWasmFile ==> |"Invoke"| TheSourceCode
+    end
+    
+    develop --> runtime
+```
+
+# Detailed Design
+
+## Design & Implementation Details
+
+1. add `WasmLoader` to load WASM file and provide the WASM function to java, also provide the java function to WASM if we need.
+
+2. add `AbstractWasmFunction` and `AbstractWasmWindowFunction` as the core interface of the WASM function api.
+
+```java
+public abstract class AbstractWasmFunction<X, T> extends WasmLoader implements Function<X, T> {

Review Comment:
   I try to avoid giving inheritence as the technique for interfaces. It contains too much knowledge for the user. 
   Can you explain why is it not possible for the interface of the user to only be an interface?
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [improve][pip] PIP-336: WASM Support for pulsar-function-java [pulsar]

Posted by "loongs-zhang (via GitHub)" <gi...@apache.org>.
loongs-zhang commented on PR #21992:
URL: https://github.com/apache/pulsar/pull/21992#issuecomment-1994618910

   In the description of [java-sdk](https://github.com/extism/java-sdk), JSON is also used for deserialization, and it requires the installation of additional dependencies, which is particularly uncomfortable.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [improve][pip] PIP-336: WASM Function API [pulsar]

Posted by "lburgazzoli (via GitHub)" <gi...@apache.org>.
lburgazzoli commented on PR #21992:
URL: https://github.com/apache/pulsar/pull/21992#issuecomment-1987314915

   FWIW
   
   I've been working on some POC to enable WASM function in Kafka Connect, Pulsar Functions and Apache Camel, where the focus was mainly to prove that [_Chicory_](https://github.com/dylibso/chicory), a pure Java WASM runtime, was good enough to be leveraged by Kafka Connect, Pulsar Functions, Camel:
   
   - https://lburgazzoli.github.io/posts/2024-02-01_apache_kafka_connect_meets_wasm_part_1/
   - https://github.com/lburgazzoli/kafka-connect-wasm-transformer
   - https://github.com/lburgazzoli/pulsar-function-wasm
   - https://lburgazzoli.github.io/posts/2024-01-14_apache_camel_meets_wasm_part_1/
   
   Because the focus was on _Chicory_, a number of assumption have been made and no performances tests have been executed.
   
   The different POCs have also different implementations, as an example, within the the Apache Camel the wole message in transit is serialize/deserialize  to/from the WASM guest function whereas in the Kafka Connect and Pulsar POCs, the developer can leverage a number of exported functions in order to access individual part of the native envelop (Message, Record, etc).
   
   I did explore the [component-model](https://component-model.bytecodealliance.org/introduction.html) and I think it would become the right thing to use, but since it was not supported by _Chicory_ and also not widely supported across the WASM runtimes I know, I opted for a conservative approach where the only requirement is to support the WASM spec (WASI is not even required). The side effect is that, an SDK would be really appreciated to reduce the boilerplate developers have to write.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [improve][pip] PIP-336: WASM Support for pulsar-function-java [pulsar]

Posted by "lburgazzoli (via GitHub)" <gi...@apache.org>.
lburgazzoli commented on PR #21992:
URL: https://github.com/apache/pulsar/pull/21992#issuecomment-1988554829

   > Before adding a new type for Pulsar Functions, I'd rather have Pulsar Functions support pluggable runtimes so that anyone could implement a new runtime time such as WASM support. It's referenced in my reply to the dev mailing list: https://lists.apache.org/thread/cqshs3h1cxn8o4yh38p3zslymfl8wgdn . The "Pluggable Pulsar Functions runtime" thread is https://lists.apache.org/thread/hcnpytky4bg4fd1xh1p4pbqbjxbv9rdg . I fully support adding support for WASM, but after adding pluggable Pulsar Functions runtime. The reason for this is that the Pulsar Functions code base is already very cluttered and there would be a benefit to make adding new runtime types possible without adding each type one-by-one making the code base even worse than it is today.
   
   +1 
   
   I'm quite new to Pulsar, but reading through the mailing list, is seems that It would also probably help to have a optimized support for the function type which for WASM would probably be very useful. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [improve][pip] PIP-336: WASM Function API [pulsar]

Posted by "asafm (via GitHub)" <gi...@apache.org>.
asafm commented on code in PR #21992:
URL: https://github.com/apache/pulsar/pull/21992#discussion_r1518882496


##########
pip/pip-331.md:
##########
@@ -0,0 +1,129 @@
+# PIP-331: WASM Function API
+
+# Background knowledge
+
+WASM(WebAssembly) bytecode is designed to be encoded in a size- and load-time-efficient binary format. WASM aims to leverage the common hardware features available on various platforms to execute in browsers at machine code speed.
+
+WASI(WebAssembly System Interface) provide a portable interface for applications that run within a constrained sandbox environment, which allows WASM to run in non browser environments such as Linux. It's portable and secure.
+
+# Motivation
+
+The server and client sides of the Pulsar function use protobuf for decoupling. In principle, the language supported by protobuf can be supported by the pulsar function, now Pulsar provided the java, python and golang function client, but there are still many languages that are not supported.
+
+Before all language adaptations are completed (and it's almost entirely certain to be impossible), users cannot write pulsar function in their familiar languages.
+
+# Goals
+
+## In Scope
+
+Other languages, as long as their code can be compiled into WASM bytecode (such as Rust/golang/C++), users can use these languages to write pulsar function.
+
+## Out of Scope
+
+All existing abilities of the Java pulsar function client are not reimplemented, the WASM Pulsar functions is under the Java Pulsar functions.
+
+Due to the strict requirements of WASM on parameter types and for simplicity reasons, types other than `java.lang.Long` are not used as parameters or return value.
+
+# High Level Design
+
+```mermaid 
+flowchart LR;
+
+    subgraph develop
+        direction TB
+        SourceCode ==> |"CompileToWASM"| WasmFile ==> |"RenameFile"| MoveToTheResourceDirectory ==> UnitTest
+    end
+
+    subgraph runtime
+        direction TB
+        PulsarFunctionJava ==> |"LoadFromResource"| TheWasmFile ==> |"Invoke"| TheSourceCode
+    end
+    
+    develop --> runtime
+```
+
+# Detailed Design
+
+## Design & Implementation Details
+
+1. add `WasmLoader` to load WASM file and provide the WASM function to java, also provide the java function to WASM if we need.
+
+2. add `AbstractWasmFunction` and `AbstractWasmWindowFunction` as the core interface of the WASM function api.
+
+```java
+public abstract class AbstractWasmFunction<X, T> extends WasmLoader implements Function<X, T> {
+
+    private static final String PROCESS_METHOD_NAME = "process";
+
+    protected static final String INITIALIZE_METHOD_NAME = "initialize";
+
+    protected static final String CLOSE_METHOD_NAME = "close";
+
+    protected static final Map<Long, Argument<?>> ARGUMENTS = new ConcurrentHashMap<>();
+
+    @Override
+    public T process(X input, Context context) {
+        return super.getWasmExtern(PROCESS_METHOD_NAME)
+                .map(process -> {
+                    Long argumentId = callWASI(input, context, process);
+                    return doProcess(input, context, argumentId);
+                })
+                .orElseThrow(() -> new PulsarWasmException(
+                        PROCESS_METHOD_NAME + " function not found in " + super.getWasmName()));
+    }
+
+    private Long callWASI(X input,
+                          Context context,
+                          Extern process) {
+        // call WASI function
+        final Long argumentId = getArgumentId(input, context);
+        ARGUMENTS.put(argumentId, new Argument<>(input, context));

Review Comment:
   @lhotari @merlimat Serializing/deserializing each message - isn't this a killer for a function runtime? Or is it legit?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [improve][pip] PIP-336: WASM Support for pulsar-function-java [pulsar]

Posted by "lhotari (via GitHub)" <gi...@apache.org>.
lhotari commented on PR #21992:
URL: https://github.com/apache/pulsar/pull/21992#issuecomment-1988678044

   > Hi, I just read your previous email, my understanding is that you plan to have the pulsar server directly call the pluggable runtime, which means that pulsar will directly send messages to `pulsar-runtime-wasm`? Is this correct? @lhotari
   
   @loongs-zhang The pluggable runtime types in Pulsar Functions wouldn't change the existing architecture where the Pulsar Functions worker coordinates how Pulsar Functions are operated. The pluggable runtime types is about making the current hardcoded JAVA, PYTHON and GO runtime types use the same pluggable runtime types SPI solution or whatever it ends up being.
   (In addition, in the future there could be also new ways to run Pulsar Functions as standalone applications, completely outside of Pulsar Functions, this could be similar as the LocalRunner, but meant for production use.)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [improve][pip] PIP-336: WASM Function API [pulsar]

Posted by "asafm (via GitHub)" <gi...@apache.org>.
asafm commented on code in PR #21992:
URL: https://github.com/apache/pulsar/pull/21992#discussion_r1495396614


##########
pip/pip-331.md:
##########
@@ -0,0 +1,129 @@
+# PIP-331: WASM Function API
+
+# Background knowledge
+
+WASM(WebAssembly) bytecode is designed to be encoded in a size- and load-time-efficient binary format. WASM aims to leverage the common hardware features available on various platforms to execute in browsers at machine code speed.
+
+WASI(WebAssembly System Interface) provide a portable interface for applications that run within a constrained sandbox environment, which allows WASM to run in non browser environments such as Linux. It's portable and secure.
+
+# Motivation
+
+The server and client sides of the Pulsar function use protobuf for decoupling. In principle, the language supported by protobuf can be supported by the pulsar function, now Pulsar provided the java, python and golang function client, but there are still many languages that are not supported.
+
+Before all language adaptations are completed (and it's almost entirely certain to be impossible), users cannot write pulsar function in their familiar languages.
+
+# Goals
+
+## In Scope
+
+Other languages, as long as their code can be compiled into WASM bytecode (such as Rust/golang/C++), users can use these languages to write pulsar function.
+
+## Out of Scope
+
+All existing abilities of the Java pulsar function client are not reimplemented, the WASM Pulsar functions is under the Java Pulsar functions.
+
+Due to the strict requirements of WASM on parameter types and for simplicity reasons, types other than `java.lang.Long` are not used as parameters or return value.
+
+# High Level Design
+
+```mermaid 
+flowchart LR;
+
+    subgraph develop
+        direction TB
+        SourceCode ==> |"CompileToWASM"| WasmFile ==> |"RenameFile"| MoveToTheResourceDirectory ==> UnitTest
+    end
+
+    subgraph runtime
+        direction TB
+        PulsarFunctionJava ==> |"LoadFromResource"| TheWasmFile ==> |"Invoke"| TheSourceCode
+    end
+    
+    develop --> runtime
+```
+
+# Detailed Design
+
+## Design & Implementation Details
+
+1. add `WasmLoader` to load WASM file and provide the WASM function to java, also provide the java function to WASM if we need.

Review Comment:
   Then you need to mention that and expand on it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [improve][pip] PIP-336: WASM Function API [pulsar]

Posted by "asafm (via GitHub)" <gi...@apache.org>.
asafm commented on code in PR #21992:
URL: https://github.com/apache/pulsar/pull/21992#discussion_r1495399544


##########
pip/pip-331.md:
##########
@@ -0,0 +1,129 @@
+# PIP-331: WASM Function API
+
+# Background knowledge
+
+WASM(WebAssembly) bytecode is designed to be encoded in a size- and load-time-efficient binary format. WASM aims to leverage the common hardware features available on various platforms to execute in browsers at machine code speed.
+
+WASI(WebAssembly System Interface) provide a portable interface for applications that run within a constrained sandbox environment, which allows WASM to run in non browser environments such as Linux. It's portable and secure.
+
+# Motivation
+
+The server and client sides of the Pulsar function use protobuf for decoupling. In principle, the language supported by protobuf can be supported by the pulsar function, now Pulsar provided the java, python and golang function client, but there are still many languages that are not supported.
+
+Before all language adaptations are completed (and it's almost entirely certain to be impossible), users cannot write pulsar function in their familiar languages.
+
+# Goals
+
+## In Scope
+
+Other languages, as long as their code can be compiled into WASM bytecode (such as Rust/golang/C++), users can use these languages to write pulsar function.
+
+## Out of Scope
+
+All existing abilities of the Java pulsar function client are not reimplemented, the WASM Pulsar functions is under the Java Pulsar functions.
+
+Due to the strict requirements of WASM on parameter types and for simplicity reasons, types other than `java.lang.Long` are not used as parameters or return value.
+
+# High Level Design
+
+```mermaid 
+flowchart LR;
+
+    subgraph develop
+        direction TB
+        SourceCode ==> |"CompileToWASM"| WasmFile ==> |"RenameFile"| MoveToTheResourceDirectory ==> UnitTest
+    end
+
+    subgraph runtime
+        direction TB
+        PulsarFunctionJava ==> |"LoadFromResource"| TheWasmFile ==> |"Invoke"| TheSourceCode
+    end
+    
+    develop --> runtime
+```
+
+# Detailed Design
+
+## Design & Implementation Details
+
+1. add `WasmLoader` to load WASM file and provide the WASM function to java, also provide the java function to WASM if we need.
+
+2. add `AbstractWasmFunction` and `AbstractWasmWindowFunction` as the core interface of the WASM function api.
+
+```java
+public abstract class AbstractWasmFunction<X, T> extends WasmLoader implements Function<X, T> {
+
+    private static final String PROCESS_METHOD_NAME = "process";
+
+    protected static final String INITIALIZE_METHOD_NAME = "initialize";
+
+    protected static final String CLOSE_METHOD_NAME = "close";
+
+    protected static final Map<Long, Argument<?>> ARGUMENTS = new ConcurrentHashMap<>();
+
+    @Override
+    public T process(X input, Context context) {
+        return super.getWasmExtern(PROCESS_METHOD_NAME)
+                .map(process -> {
+                    Long argumentId = callWASI(input, context, process);
+                    return doProcess(input, context, argumentId);
+                })
+                .orElseThrow(() -> new PulsarWasmException(
+                        PROCESS_METHOD_NAME + " function not found in " + super.getWasmName()));
+    }
+
+    private Long callWASI(X input,
+                          Context context,
+                          Extern process) {
+        // call WASI function
+        final Long argumentId = getArgumentId(input, context);
+        ARGUMENTS.put(argumentId, new Argument<>(input, context));

Review Comment:
   Serialization sounds slow, as it should be done per message, no?
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [improve][pip] PIP-336: WASM Function API [pulsar]

Posted by "loongs-zhang (via GitHub)" <gi...@apache.org>.
loongs-zhang commented on code in PR #21992:
URL: https://github.com/apache/pulsar/pull/21992#discussion_r1494052381


##########
pip/pip-331.md:
##########
@@ -0,0 +1,129 @@
+# PIP-331: WASM Function API
+
+# Background knowledge
+
+WASM(WebAssembly) bytecode is designed to be encoded in a size- and load-time-efficient binary format. WASM aims to leverage the common hardware features available on various platforms to execute in browsers at machine code speed.
+
+WASI(WebAssembly System Interface) provide a portable interface for applications that run within a constrained sandbox environment, which allows WASM to run in non browser environments such as Linux. It's portable and secure.
+
+# Motivation
+
+The server and client sides of the Pulsar function use protobuf for decoupling. In principle, the language supported by protobuf can be supported by the pulsar function, now Pulsar provided the java, python and golang function client, but there are still many languages that are not supported.
+
+Before all language adaptations are completed (and it's almost entirely certain to be impossible), users cannot write pulsar function in their familiar languages.
+
+# Goals
+
+## In Scope
+
+Other languages, as long as their code can be compiled into WASM bytecode (such as Rust/golang/C++), users can use these languages to write pulsar function.
+
+## Out of Scope
+
+All existing abilities of the Java pulsar function client are not reimplemented, the WASM Pulsar functions is under the Java Pulsar functions.
+
+Due to the strict requirements of WASM on parameter types and for simplicity reasons, types other than `java.lang.Long` are not used as parameters or return value.
+
+# High Level Design

Review Comment:
   > It would be very useful if you could explain how these diagram stages work and provide context.
   
   > 1. Do we provide a template Maven / Gradle project that contains the build plugin, which compiles to a WASM file?
   
   I think this should be done. Different languages require different tools to compile into WASM, which means multiple template projects need to be provided.
   
   > 2. Why must we move to a resource directory to run the unit test?
   
   Just for load WASM file correctly.
   
   > Does the build do it for us? Do we intend to provide a Unit test framework for the users to execute or author the tests?
   
   It is impossible to provide a unit testing framework, cause different languages require different tools to compile into WASM. But a github workflow can be done.
   
   > 3. Can you expand on how the "LoadFromResource" works?
   
   Just load the WASM file, see `WasmLoader#WasmLoader` in https://github.com/apache/pulsar/pull/21975.
   
   > Do you intend to use an open-source Java library that does that? Which library is it?
   
   [wasmtime-java](https://github.com/kawamuray/wasmtime-java) and [chicory](https://github.com/dylibso/chicory)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [improve][pip] PIP-336: WASM Function API [pulsar]

Posted by "loongs-zhang (via GitHub)" <gi...@apache.org>.
loongs-zhang commented on code in PR #21992:
URL: https://github.com/apache/pulsar/pull/21992#discussion_r1516140574


##########
pip/pip-331.md:
##########
@@ -0,0 +1,129 @@
+# PIP-331: WASM Function API
+
+# Background knowledge
+
+WASM(WebAssembly) bytecode is designed to be encoded in a size- and load-time-efficient binary format. WASM aims to leverage the common hardware features available on various platforms to execute in browsers at machine code speed.
+
+WASI(WebAssembly System Interface) provide a portable interface for applications that run within a constrained sandbox environment, which allows WASM to run in non browser environments such as Linux. It's portable and secure.
+
+# Motivation
+
+The server and client sides of the Pulsar function use protobuf for decoupling. In principle, the language supported by protobuf can be supported by the pulsar function, now Pulsar provided the java, python and golang function client, but there are still many languages that are not supported.
+
+Before all language adaptations are completed (and it's almost entirely certain to be impossible), users cannot write pulsar function in their familiar languages.
+
+# Goals
+
+## In Scope
+
+Other languages, as long as their code can be compiled into WASM bytecode (such as Rust/golang/C++), users can use these languages to write pulsar function.
+
+## Out of Scope
+
+All existing abilities of the Java pulsar function client are not reimplemented, the WASM Pulsar functions is under the Java Pulsar functions.

Review Comment:
   I have seen the implementation of [wasmtime-java](https://github.com/kawamuray/wasmtime-java), the low-level is [wasmtime](https://github.com/bytecodealliance/wasmtime) (see [Cargo.toml](https://github.com/kawamuray/wasmtime-java/blob/master/wasmtime-jni/Cargo.toml)), with wrapped in JNI. Therefore, I believe it is stable.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [improve][pip] PIP-336: WASM Function API [pulsar]

Posted by "asafm (via GitHub)" <gi...@apache.org>.
asafm commented on PR #21992:
URL: https://github.com/apache/pulsar/pull/21992#issuecomment-1987290286

   Honestly, I'm still completely lost. I'll start with one item I have and we'll continue from there:
   
   What I expect from this PIP is that we will explain how to author a function that accepts a Message and return a Message or List of Message. 
   In order to do that, first we need types that should be identical in each language I will author my function. Types to represent a Message (in), a list of Message (out).
   You mentioned serializing to byte[] and from it to something, but I couldn't figure out what that type is, and how is it common between all languages?
   
   Side note: This PIP goal is awfully complex. Having the all the high level information in this PIP be 15 sentences - well that's quite hard. Unless you list pre-require list of reading material which the reader should know before starting. I don't recommend it.
   
   I did some reading. It seems that there is something called Component Model. This enables you to define in a text file, using a special language they invented,  a function, its arguments, and most importantly its types, which can be complex data structures. See here: https://component-model.bytecodealliance.org/introduction.html. From my understanding, we can compile this text file into a set of interfaces in many languages (java, C#, Go, ...) and the user authoring a function will pick it for their language, and have native types representing `Message` and other things we need.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [improve][pip] PIP-336: WASM Support for pulsar-function-java [pulsar]

Posted by "lburgazzoli (via GitHub)" <gi...@apache.org>.
lburgazzoli commented on PR #21992:
URL: https://github.com/apache/pulsar/pull/21992#issuecomment-1988619662

   > > In addition, there is an awesome WASM plugin system https://github.com/extism/extism which I did not take into account because as said already, I did want to prove the _Chicory_ APIs and also because _Chicory_ is not yet supported
   > 
   > Seems quite nice. I couldn't really figure out there how or where you define the "interface" of your plugin, which the users will implement.
   
   You still have to provide it in a form of a SDK, exitism save you for all the boilerplate do deal with memory management & so on. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [improve][pip] PIP-336: WASM Support for pulsar-function-java [pulsar]

Posted by "lburgazzoli (via GitHub)" <gi...@apache.org>.
lburgazzoli commented on PR #21992:
URL: https://github.com/apache/pulsar/pull/21992#issuecomment-1988503121

   In addition, there is an awesome WASM plugin system https://github.com/extism/extism which I did not take into account because as said already, I did want to prove the _Chicory_ APIs and also because _Chicory_ is not yet supported


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [improve][pip] PIP-336: WASM Function API [pulsar]

Posted by "asafm (via GitHub)" <gi...@apache.org>.
asafm commented on code in PR #21992:
URL: https://github.com/apache/pulsar/pull/21992#discussion_r1495396150


##########
pip/pip-331.md:
##########
@@ -0,0 +1,129 @@
+# PIP-331: WASM Function API
+
+# Background knowledge
+
+WASM(WebAssembly) bytecode is designed to be encoded in a size- and load-time-efficient binary format. WASM aims to leverage the common hardware features available on various platforms to execute in browsers at machine code speed.
+
+WASI(WebAssembly System Interface) provide a portable interface for applications that run within a constrained sandbox environment, which allows WASM to run in non browser environments such as Linux. It's portable and secure.
+
+# Motivation
+
+The server and client sides of the Pulsar function use protobuf for decoupling. In principle, the language supported by protobuf can be supported by the pulsar function, now Pulsar provided the java, python and golang function client, but there are still many languages that are not supported.
+
+Before all language adaptations are completed (and it's almost entirely certain to be impossible), users cannot write pulsar function in their familiar languages.
+
+# Goals
+
+## In Scope
+
+Other languages, as long as their code can be compiled into WASM bytecode (such as Rust/golang/C++), users can use these languages to write pulsar function.
+
+## Out of Scope
+
+All existing abilities of the Java pulsar function client are not reimplemented, the WASM Pulsar functions is under the Java Pulsar functions.
+
+Due to the strict requirements of WASM on parameter types and for simplicity reasons, types other than `java.lang.Long` are not used as parameters or return value.

Review Comment:
   I think there is a problem here. I can't learn about this feature you mentioned from the documentation of wasmtime-java since it's non-existent, and the scarce documentation in this PIP doesn't help. How can this be maintained by other people going forward?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [improve][pip] PIP-336: WASM Function API [pulsar]

Posted by "loongs-zhang (via GitHub)" <gi...@apache.org>.
loongs-zhang commented on code in PR #21992:
URL: https://github.com/apache/pulsar/pull/21992#discussion_r1493959903


##########
pip/pip-331.md:
##########
@@ -0,0 +1,129 @@
+# PIP-331: WASM Function API
+
+# Background knowledge
+
+WASM(WebAssembly) bytecode is designed to be encoded in a size- and load-time-efficient binary format. WASM aims to leverage the common hardware features available on various platforms to execute in browsers at machine code speed.
+
+WASI(WebAssembly System Interface) provide a portable interface for applications that run within a constrained sandbox environment, which allows WASM to run in non browser environments such as Linux. It's portable and secure.
+
+# Motivation
+
+The server and client sides of the Pulsar function use protobuf for decoupling. In principle, the language supported by protobuf can be supported by the pulsar function, now Pulsar provided the java, python and golang function client, but there are still many languages that are not supported.
+
+Before all language adaptations are completed (and it's almost entirely certain to be impossible), users cannot write pulsar function in their familiar languages.
+
+# Goals
+
+## In Scope
+
+Other languages, as long as their code can be compiled into WASM bytecode (such as Rust/golang/C++), users can use these languages to write pulsar function.
+
+## Out of Scope
+
+All existing abilities of the Java pulsar function client are not reimplemented, the WASM Pulsar functions is under the Java Pulsar functions.
+
+Due to the strict requirements of WASM on parameter types and for simplicity reasons, types other than `java.lang.Long` are not used as parameters or return value.
+
+# High Level Design
+
+```mermaid 
+flowchart LR;
+
+    subgraph develop
+        direction TB
+        SourceCode ==> |"CompileToWASM"| WasmFile ==> |"RenameFile"| MoveToTheResourceDirectory ==> UnitTest
+    end
+
+    subgraph runtime
+        direction TB
+        PulsarFunctionJava ==> |"LoadFromResource"| TheWasmFile ==> |"Invoke"| TheSourceCode
+    end
+    
+    develop --> runtime
+```
+
+# Detailed Design
+
+## Design & Implementation Details
+
+1. add `WasmLoader` to load WASM file and provide the WASM function to java, also provide the java function to WASM if we need.

Review Comment:
   Yes, there is.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [improve][pip] PIP-336: WASM Function API [pulsar]

Posted by "loongs-zhang (via GitHub)" <gi...@apache.org>.
loongs-zhang commented on code in PR #21992:
URL: https://github.com/apache/pulsar/pull/21992#discussion_r1494058344


##########
pip/pip-331.md:
##########
@@ -0,0 +1,129 @@
+# PIP-331: WASM Function API
+
+# Background knowledge
+
+WASM(WebAssembly) bytecode is designed to be encoded in a size- and load-time-efficient binary format. WASM aims to leverage the common hardware features available on various platforms to execute in browsers at machine code speed.
+
+WASI(WebAssembly System Interface) provide a portable interface for applications that run within a constrained sandbox environment, which allows WASM to run in non browser environments such as Linux. It's portable and secure.
+
+# Motivation
+
+The server and client sides of the Pulsar function use protobuf for decoupling. In principle, the language supported by protobuf can be supported by the pulsar function, now Pulsar provided the java, python and golang function client, but there are still many languages that are not supported.
+
+Before all language adaptations are completed (and it's almost entirely certain to be impossible), users cannot write pulsar function in their familiar languages.
+
+# Goals
+
+## In Scope
+
+Other languages, as long as their code can be compiled into WASM bytecode (such as Rust/golang/C++), users can use these languages to write pulsar function.
+
+## Out of Scope
+
+All existing abilities of the Java pulsar function client are not reimplemented, the WASM Pulsar functions is under the Java Pulsar functions.
+
+Due to the strict requirements of WASM on parameter types and for simplicity reasons, types other than `java.lang.Long` are not used as parameters or return value.
+
+# High Level Design
+
+```mermaid 
+flowchart LR;
+
+    subgraph develop
+        direction TB
+        SourceCode ==> |"CompileToWASM"| WasmFile ==> |"RenameFile"| MoveToTheResourceDirectory ==> UnitTest
+    end
+
+    subgraph runtime
+        direction TB
+        PulsarFunctionJava ==> |"LoadFromResource"| TheWasmFile ==> |"Invoke"| TheSourceCode
+    end
+    
+    develop --> runtime
+```
+
+# Detailed Design
+
+## Design & Implementation Details
+
+1. add `WasmLoader` to load WASM file and provide the WASM function to java, also provide the java function to WASM if we need.
+
+2. add `AbstractWasmFunction` and `AbstractWasmWindowFunction` as the core interface of the WASM function api.
+
+```java
+public abstract class AbstractWasmFunction<X, T> extends WasmLoader implements Function<X, T> {

Review Comment:
   We must load the WASM file and call the WASM function, otherwise it is meaningless. Abstract classes can ensure loading and calling, but interfaces cannot.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [improve][pip] PIP-336: WASM Function API [pulsar]

Posted by "RobertIndie (via GitHub)" <gi...@apache.org>.
RobertIndie commented on code in PR #21992:
URL: https://github.com/apache/pulsar/pull/21992#discussion_r1496859025


##########
pip/pip-331.md:
##########
@@ -0,0 +1,129 @@
+# PIP-331: WASM Function API
+
+# Background knowledge
+
+WASM(WebAssembly) bytecode is designed to be encoded in a size- and load-time-efficient binary format. WASM aims to leverage the common hardware features available on various platforms to execute in browsers at machine code speed.
+
+WASI(WebAssembly System Interface) provide a portable interface for applications that run within a constrained sandbox environment, which allows WASM to run in non browser environments such as Linux. It's portable and secure.
+
+# Motivation
+
+The server and client sides of the Pulsar function use protobuf for decoupling. In principle, the language supported by protobuf can be supported by the pulsar function, now Pulsar provided the java, python and golang function client, but there are still many languages that are not supported.
+
+Before all language adaptations are completed (and it's almost entirely certain to be impossible), users cannot write pulsar function in their familiar languages.
+
+# Goals
+
+## In Scope
+
+Other languages, as long as their code can be compiled into WASM bytecode (such as Rust/golang/C++), users can use these languages to write pulsar function.
+
+## Out of Scope
+
+All existing abilities of the Java pulsar function client are not reimplemented, the WASM Pulsar functions is under the Java Pulsar functions.
+
+Due to the strict requirements of WASM on parameter types and for simplicity reasons, types other than `java.lang.Long` are not used as parameters or return value.
+
+# High Level Design
+
+```mermaid 
+flowchart LR;
+
+    subgraph develop
+        direction TB
+        SourceCode ==> |"CompileToWASM"| WasmFile ==> |"RenameFile"| MoveToTheResourceDirectory ==> UnitTest
+    end
+
+    subgraph runtime
+        direction TB
+        PulsarFunctionJava ==> |"LoadFromResource"| TheWasmFile ==> |"Invoke"| TheSourceCode
+    end
+    
+    develop --> runtime
+```
+
+# Detailed Design
+
+## Design & Implementation Details
+
+1. add `WasmLoader` to load WASM file and provide the WASM function to java, also provide the java function to WASM if we need.
+
+2. add `AbstractWasmFunction` and `AbstractWasmWindowFunction` as the core interface of the WASM function api.
+
+```java
+public abstract class AbstractWasmFunction<X, T> extends WasmLoader implements Function<X, T> {
+
+    private static final String PROCESS_METHOD_NAME = "process";
+
+    protected static final String INITIALIZE_METHOD_NAME = "initialize";
+
+    protected static final String CLOSE_METHOD_NAME = "close";
+
+    protected static final Map<Long, Argument<?>> ARGUMENTS = new ConcurrentHashMap<>();
+
+    @Override
+    public T process(X input, Context context) {
+        return super.getWasmExtern(PROCESS_METHOD_NAME)
+                .map(process -> {
+                    Long argumentId = callWASI(input, context, process);
+                    return doProcess(input, context, argumentId);
+                })
+                .orElseThrow(() -> new PulsarWasmException(
+                        PROCESS_METHOD_NAME + " function not found in " + super.getWasmName()));
+    }
+
+    private Long callWASI(X input,
+                          Context context,
+                          Extern process) {
+        // call WASI function
+        final Long argumentId = getArgumentId(input, context);
+        ARGUMENTS.put(argumentId, new Argument<>(input, context));
+        // WASI cannot easily pass Java objects like JNI, here we pass Long
+        // then we can get the argument by Long
+        WasmFunctions.consumer(super.getStore(), process.func(), WasmValType.I64)
+                .accept(argumentId);
+        ARGUMENTS.remove(argumentId);
+        return argumentId;
+    }
+
+    protected abstract T doProcess(X input, Context context, Long argumentId);
+
+    protected abstract Long getArgumentId(X input, Context context);
+
+    @Override
+    public void initialize(Context context) {
+        super.getWasmExtern(INITIALIZE_METHOD_NAME)
+                .ifPresent(initialize -> callWASI(null, context, initialize));
+    }
+
+    @Override
+    public void close() {
+        super.getWasmExtern(CLOSE_METHOD_NAME)
+                .ifPresent(close -> callWASI(null, null, close));
+        super.close();
+    }
+
+    protected static class Argument<X> {
+        protected X input;
+        protected Context context;
+
+        private Argument(X input, Context context) {
+            this.input = input;
+            this.context = context;
+        }
+    }
+}
+```
+
+More detailed code implementation and test can be found in [here](https://github.com/apache/pulsar/pull/21975)

Review Comment:
   It's better to extract more design details from the code implementation to this proposal.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [improve][pip] PIP-336: WASM Function API [pulsar]

Posted by "loongs-zhang (via GitHub)" <gi...@apache.org>.
loongs-zhang commented on code in PR #21992:
URL: https://github.com/apache/pulsar/pull/21992#discussion_r1516163585


##########
pip/pip-331.md:
##########
@@ -0,0 +1,129 @@
+# PIP-331: WASM Function API
+
+# Background knowledge
+
+WASM(WebAssembly) bytecode is designed to be encoded in a size- and load-time-efficient binary format. WASM aims to leverage the common hardware features available on various platforms to execute in browsers at machine code speed.
+
+WASI(WebAssembly System Interface) provide a portable interface for applications that run within a constrained sandbox environment, which allows WASM to run in non browser environments such as Linux. It's portable and secure.
+
+# Motivation
+
+The server and client sides of the Pulsar function use protobuf for decoupling. In principle, the language supported by protobuf can be supported by the pulsar function, now Pulsar provided the java, python and golang function client, but there are still many languages that are not supported.
+
+Before all language adaptations are completed (and it's almost entirely certain to be impossible), users cannot write pulsar function in their familiar languages.
+
+# Goals
+
+## In Scope
+
+Other languages, as long as their code can be compiled into WASM bytecode (such as Rust/golang/C++), users can use these languages to write pulsar function.
+
+## Out of Scope
+
+All existing abilities of the Java pulsar function client are not reimplemented, the WASM Pulsar functions is under the Java Pulsar functions.
+
+Due to the strict requirements of WASM on parameter types and for simplicity reasons, types other than `java.lang.Long` are not used as parameters or return value.
+
+# High Level Design
+
+```mermaid 
+flowchart LR;
+
+    subgraph develop
+        direction TB
+        SourceCode ==> |"CompileToWASM"| WasmFile ==> |"RenameFile"| MoveToTheResourceDirectory ==> UnitTest
+    end
+
+    subgraph runtime
+        direction TB
+        PulsarFunctionJava ==> |"LoadFromResource"| TheWasmFile ==> |"Invoke"| TheSourceCode
+    end
+    
+    develop --> runtime
+```
+
+# Detailed Design
+
+## Design & Implementation Details
+
+1. add `WasmLoader` to load WASM file and provide the WASM function to java, also provide the java function to WASM if we need.
+
+2. add `AbstractWasmFunction` and `AbstractWasmWindowFunction` as the core interface of the WASM function api.
+
+```java
+public abstract class AbstractWasmFunction<X, T> extends WasmLoader implements Function<X, T> {
+
+    private static final String PROCESS_METHOD_NAME = "process";
+
+    protected static final String INITIALIZE_METHOD_NAME = "initialize";
+
+    protected static final String CLOSE_METHOD_NAME = "close";
+
+    protected static final Map<Long, Argument<?>> ARGUMENTS = new ConcurrentHashMap<>();
+
+    @Override
+    public T process(X input, Context context) {
+        return super.getWasmExtern(PROCESS_METHOD_NAME)
+                .map(process -> {
+                    Long argumentId = callWASI(input, context, process);
+                    return doProcess(input, context, argumentId);
+                })
+                .orElseThrow(() -> new PulsarWasmException(
+                        PROCESS_METHOD_NAME + " function not found in " + super.getWasmName()));
+    }
+
+    private Long callWASI(X input,
+                          Context context,
+                          Extern process) {
+        // call WASI function
+        final Long argumentId = getArgumentId(input, context);
+        ARGUMENTS.put(argumentId, new Argument<>(input, context));

Review Comment:
   Well, WASM can easily read a `byte[]` from other languages, but it cannot directly convert it into a Java object, which is far inferior to JNI, but its cross platform aspect is much worse than WASM(Imagine, the user use JNI and C to cross platform, and when users switched to Golang, they had to do it again). Any suggestions?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [improve][pip] PIP-336: WASM Function API [pulsar]

Posted by "asafm (via GitHub)" <gi...@apache.org>.
asafm commented on code in PR #21992:
URL: https://github.com/apache/pulsar/pull/21992#discussion_r1518886656


##########
pip/pip-331.md:
##########
@@ -0,0 +1,129 @@
+# PIP-331: WASM Function API
+
+# Background knowledge
+
+WASM(WebAssembly) bytecode is designed to be encoded in a size- and load-time-efficient binary format. WASM aims to leverage the common hardware features available on various platforms to execute in browsers at machine code speed.
+
+WASI(WebAssembly System Interface) provide a portable interface for applications that run within a constrained sandbox environment, which allows WASM to run in non browser environments such as Linux. It's portable and secure.
+
+# Motivation
+
+The server and client sides of the Pulsar function use protobuf for decoupling. In principle, the language supported by protobuf can be supported by the pulsar function, now Pulsar provided the java, python and golang function client, but there are still many languages that are not supported.
+
+Before all language adaptations are completed (and it's almost entirely certain to be impossible), users cannot write pulsar function in their familiar languages.
+
+# Goals
+
+## In Scope
+
+Other languages, as long as their code can be compiled into WASM bytecode (such as Rust/golang/C++), users can use these languages to write pulsar function.
+
+## Out of Scope
+
+All existing abilities of the Java pulsar function client are not reimplemented, the WASM Pulsar functions is under the Java Pulsar functions.
+
+Due to the strict requirements of WASM on parameter types and for simplicity reasons, types other than `java.lang.Long` are not used as parameters or return value.
+
+# High Level Design
+
+```mermaid 
+flowchart LR;
+
+    subgraph develop
+        direction TB
+        SourceCode ==> |"CompileToWASM"| WasmFile ==> |"RenameFile"| MoveToTheResourceDirectory ==> UnitTest
+    end
+
+    subgraph runtime
+        direction TB
+        PulsarFunctionJava ==> |"LoadFromResource"| TheWasmFile ==> |"Invoke"| TheSourceCode
+    end
+    
+    develop --> runtime
+```
+
+# Detailed Design
+
+## Design & Implementation Details
+
+1. add `WasmLoader` to load WASM file and provide the WASM function to java, also provide the java function to WASM if we need.

Review Comment:
   You haven't mentioned it yet.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [improve][pip] PIP-336: WASM Support for pulsar-function-java [pulsar]

Posted by "asafm (via GitHub)" <gi...@apache.org>.
asafm commented on PR #21992:
URL: https://github.com/apache/pulsar/pull/21992#issuecomment-1988207401

   @lburgazzoli Brilliant. Can you explain how the function access the data at a `Message` object in the WASM function - regardless of the language it is written in? Can you also explain how does this differ from Component Model? 
   Is it reasonable to just wait for Compoent Model to be implemented in a runtime that can from Java instead of supporting two flavors? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [improve][pip] PIP-336: WASM Function API [pulsar]

Posted by "loongs-zhang (via GitHub)" <gi...@apache.org>.
loongs-zhang commented on code in PR #21992:
URL: https://github.com/apache/pulsar/pull/21992#discussion_r1516163585


##########
pip/pip-331.md:
##########
@@ -0,0 +1,129 @@
+# PIP-331: WASM Function API
+
+# Background knowledge
+
+WASM(WebAssembly) bytecode is designed to be encoded in a size- and load-time-efficient binary format. WASM aims to leverage the common hardware features available on various platforms to execute in browsers at machine code speed.
+
+WASI(WebAssembly System Interface) provide a portable interface for applications that run within a constrained sandbox environment, which allows WASM to run in non browser environments such as Linux. It's portable and secure.
+
+# Motivation
+
+The server and client sides of the Pulsar function use protobuf for decoupling. In principle, the language supported by protobuf can be supported by the pulsar function, now Pulsar provided the java, python and golang function client, but there are still many languages that are not supported.
+
+Before all language adaptations are completed (and it's almost entirely certain to be impossible), users cannot write pulsar function in their familiar languages.
+
+# Goals
+
+## In Scope
+
+Other languages, as long as their code can be compiled into WASM bytecode (such as Rust/golang/C++), users can use these languages to write pulsar function.
+
+## Out of Scope
+
+All existing abilities of the Java pulsar function client are not reimplemented, the WASM Pulsar functions is under the Java Pulsar functions.
+
+Due to the strict requirements of WASM on parameter types and for simplicity reasons, types other than `java.lang.Long` are not used as parameters or return value.
+
+# High Level Design
+
+```mermaid 
+flowchart LR;
+
+    subgraph develop
+        direction TB
+        SourceCode ==> |"CompileToWASM"| WasmFile ==> |"RenameFile"| MoveToTheResourceDirectory ==> UnitTest
+    end
+
+    subgraph runtime
+        direction TB
+        PulsarFunctionJava ==> |"LoadFromResource"| TheWasmFile ==> |"Invoke"| TheSourceCode
+    end
+    
+    develop --> runtime
+```
+
+# Detailed Design
+
+## Design & Implementation Details
+
+1. add `WasmLoader` to load WASM file and provide the WASM function to java, also provide the java function to WASM if we need.
+
+2. add `AbstractWasmFunction` and `AbstractWasmWindowFunction` as the core interface of the WASM function api.
+
+```java
+public abstract class AbstractWasmFunction<X, T> extends WasmLoader implements Function<X, T> {
+
+    private static final String PROCESS_METHOD_NAME = "process";
+
+    protected static final String INITIALIZE_METHOD_NAME = "initialize";
+
+    protected static final String CLOSE_METHOD_NAME = "close";
+
+    protected static final Map<Long, Argument<?>> ARGUMENTS = new ConcurrentHashMap<>();
+
+    @Override
+    public T process(X input, Context context) {
+        return super.getWasmExtern(PROCESS_METHOD_NAME)
+                .map(process -> {
+                    Long argumentId = callWASI(input, context, process);
+                    return doProcess(input, context, argumentId);
+                })
+                .orElseThrow(() -> new PulsarWasmException(
+                        PROCESS_METHOD_NAME + " function not found in " + super.getWasmName()));
+    }
+
+    private Long callWASI(X input,
+                          Context context,
+                          Extern process) {
+        // call WASI function
+        final Long argumentId = getArgumentId(input, context);
+        ARGUMENTS.put(argumentId, new Argument<>(input, context));

Review Comment:
   Well, WASM can easily read a `byte[]` from other languages, but it cannot directly convert it into a Java object, which is far inferior to JNI, but its cross platform aspect is much worse than WASM(Imagine, the user use JNI and C to cross platform, and when users switched to Golang, they had to do it again). Any suggestions?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [improve][pip] PIP-336: WASM Function API [pulsar]

Posted by "loongs-zhang (via GitHub)" <gi...@apache.org>.
loongs-zhang commented on code in PR #21992:
URL: https://github.com/apache/pulsar/pull/21992#discussion_r1516140574


##########
pip/pip-331.md:
##########
@@ -0,0 +1,129 @@
+# PIP-331: WASM Function API
+
+# Background knowledge
+
+WASM(WebAssembly) bytecode is designed to be encoded in a size- and load-time-efficient binary format. WASM aims to leverage the common hardware features available on various platforms to execute in browsers at machine code speed.
+
+WASI(WebAssembly System Interface) provide a portable interface for applications that run within a constrained sandbox environment, which allows WASM to run in non browser environments such as Linux. It's portable and secure.
+
+# Motivation
+
+The server and client sides of the Pulsar function use protobuf for decoupling. In principle, the language supported by protobuf can be supported by the pulsar function, now Pulsar provided the java, python and golang function client, but there are still many languages that are not supported.
+
+Before all language adaptations are completed (and it's almost entirely certain to be impossible), users cannot write pulsar function in their familiar languages.
+
+# Goals
+
+## In Scope
+
+Other languages, as long as their code can be compiled into WASM bytecode (such as Rust/golang/C++), users can use these languages to write pulsar function.
+
+## Out of Scope
+
+All existing abilities of the Java pulsar function client are not reimplemented, the WASM Pulsar functions is under the Java Pulsar functions.

Review Comment:
   I have seen the implementation of [wasmtime-java](https://github.com/kawamuray/wasmtime-java), the low-level is [wasmtime](https://github.com/bytecodealliance/wasmtime) (see [Cargo.toml](https://github.com/kawamuray/wasmtime-java/blob/master/wasmtime-jni/Cargo.toml)), with wrapped in JNI.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [improve][pip] PIP-336: WASM Function API [pulsar]

Posted by "loongs-zhang (via GitHub)" <gi...@apache.org>.
loongs-zhang commented on code in PR #21992:
URL: https://github.com/apache/pulsar/pull/21992#discussion_r1516140574


##########
pip/pip-331.md:
##########
@@ -0,0 +1,129 @@
+# PIP-331: WASM Function API
+
+# Background knowledge
+
+WASM(WebAssembly) bytecode is designed to be encoded in a size- and load-time-efficient binary format. WASM aims to leverage the common hardware features available on various platforms to execute in browsers at machine code speed.
+
+WASI(WebAssembly System Interface) provide a portable interface for applications that run within a constrained sandbox environment, which allows WASM to run in non browser environments such as Linux. It's portable and secure.
+
+# Motivation
+
+The server and client sides of the Pulsar function use protobuf for decoupling. In principle, the language supported by protobuf can be supported by the pulsar function, now Pulsar provided the java, python and golang function client, but there are still many languages that are not supported.
+
+Before all language adaptations are completed (and it's almost entirely certain to be impossible), users cannot write pulsar function in their familiar languages.
+
+# Goals
+
+## In Scope
+
+Other languages, as long as their code can be compiled into WASM bytecode (such as Rust/golang/C++), users can use these languages to write pulsar function.
+
+## Out of Scope
+
+All existing abilities of the Java pulsar function client are not reimplemented, the WASM Pulsar functions is under the Java Pulsar functions.

Review Comment:
   I have seen the implementation of [wasmtime-java](https://github.com/kawamuray/wasmtime-java), the low-level is [wasmtime](https://github.com/bytecodealliance/wasmtime) (see https://github.com/kawamuray/wasmtime-java/blob/master/wasmtime-jni/Cargo.toml), with wrapped in JNI.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [improve][pip] PIP-336: WASM Function API [pulsar]

Posted by "loongs-zhang (via GitHub)" <gi...@apache.org>.
loongs-zhang commented on code in PR #21992:
URL: https://github.com/apache/pulsar/pull/21992#discussion_r1516143537


##########
pip/pip-331.md:
##########
@@ -0,0 +1,129 @@
+# PIP-331: WASM Function API
+
+# Background knowledge
+
+WASM(WebAssembly) bytecode is designed to be encoded in a size- and load-time-efficient binary format. WASM aims to leverage the common hardware features available on various platforms to execute in browsers at machine code speed.
+
+WASI(WebAssembly System Interface) provide a portable interface for applications that run within a constrained sandbox environment, which allows WASM to run in non browser environments such as Linux. It's portable and secure.
+
+# Motivation
+
+The server and client sides of the Pulsar function use protobuf for decoupling. In principle, the language supported by protobuf can be supported by the pulsar function, now Pulsar provided the java, python and golang function client, but there are still many languages that are not supported.
+
+Before all language adaptations are completed (and it's almost entirely certain to be impossible), users cannot write pulsar function in their familiar languages.
+
+# Goals
+
+## In Scope
+
+Other languages, as long as their code can be compiled into WASM bytecode (such as Rust/golang/C++), users can use these languages to write pulsar function.
+
+## Out of Scope
+
+All existing abilities of the Java pulsar function client are not reimplemented, the WASM Pulsar functions is under the Java Pulsar functions.
+
+Due to the strict requirements of WASM on parameter types and for simplicity reasons, types other than `java.lang.Long` are not used as parameters or return value.

Review Comment:
   Sorry, I will update the document this weekend.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [improve][pip] PIP-336: WASM Support for pulsar-function-java [pulsar]

Posted by "lhotari (via GitHub)" <gi...@apache.org>.
lhotari commented on PR #21992:
URL: https://github.com/apache/pulsar/pull/21992#issuecomment-1988539988

   Before adding a new type for Pulsar Functions, I'd rather have Pulsar Functions support pluggable runtimes so that anyone could implement a new runtime time such as WASM support. It's referenced in my reply to the dev mailing list: https://lists.apache.org/thread/cqshs3h1cxn8o4yh38p3zslymfl8wgdn .
   The "Pluggable Pulsar Functions runtime" thread is https://lists.apache.org/thread/hcnpytky4bg4fd1xh1p4pbqbjxbv9rdg .
   I fully support adding support for WASM, but after adding pluggable Pulsar Functions runtime.
   The reason for this is that the Pulsar Functions code base is already very cluttered and there would be a benefit to make adding new runtime types possible without adding each type one-by-one making the code base even worse than it is today.
   
   
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [improve][pip] PIP-336: WASM Function API [pulsar]

Posted by "asafm (via GitHub)" <gi...@apache.org>.
asafm commented on code in PR #21992:
URL: https://github.com/apache/pulsar/pull/21992#discussion_r1518882193


##########
pip/pip-331.md:
##########
@@ -0,0 +1,129 @@
+# PIP-331: WASM Function API
+
+# Background knowledge
+
+WASM(WebAssembly) bytecode is designed to be encoded in a size- and load-time-efficient binary format. WASM aims to leverage the common hardware features available on various platforms to execute in browsers at machine code speed.
+
+WASI(WebAssembly System Interface) provide a portable interface for applications that run within a constrained sandbox environment, which allows WASM to run in non browser environments such as Linux. It's portable and secure.
+
+# Motivation
+
+The server and client sides of the Pulsar function use protobuf for decoupling. In principle, the language supported by protobuf can be supported by the pulsar function, now Pulsar provided the java, python and golang function client, but there are still many languages that are not supported.
+
+Before all language adaptations are completed (and it's almost entirely certain to be impossible), users cannot write pulsar function in their familiar languages.
+
+# Goals
+
+## In Scope
+
+Other languages, as long as their code can be compiled into WASM bytecode (such as Rust/golang/C++), users can use these languages to write pulsar function.
+
+## Out of Scope
+
+All existing abilities of the Java pulsar function client are not reimplemented, the WASM Pulsar functions is under the Java Pulsar functions.
+
+Due to the strict requirements of WASM on parameter types and for simplicity reasons, types other than `java.lang.Long` are not used as parameters or return value.
+
+# High Level Design
+
+```mermaid 
+flowchart LR;
+
+    subgraph develop
+        direction TB
+        SourceCode ==> |"CompileToWASM"| WasmFile ==> |"RenameFile"| MoveToTheResourceDirectory ==> UnitTest
+    end
+
+    subgraph runtime
+        direction TB
+        PulsarFunctionJava ==> |"LoadFromResource"| TheWasmFile ==> |"Invoke"| TheSourceCode
+    end
+    
+    develop --> runtime
+```
+
+# Detailed Design
+
+## Design & Implementation Details
+
+1. add `WasmLoader` to load WASM file and provide the WASM function to java, also provide the java function to WASM if we need.
+
+2. add `AbstractWasmFunction` and `AbstractWasmWindowFunction` as the core interface of the WASM function api.
+
+```java
+public abstract class AbstractWasmFunction<X, T> extends WasmLoader implements Function<X, T> {
+
+    private static final String PROCESS_METHOD_NAME = "process";
+
+    protected static final String INITIALIZE_METHOD_NAME = "initialize";
+
+    protected static final String CLOSE_METHOD_NAME = "close";
+
+    protected static final Map<Long, Argument<?>> ARGUMENTS = new ConcurrentHashMap<>();
+
+    @Override
+    public T process(X input, Context context) {
+        return super.getWasmExtern(PROCESS_METHOD_NAME)
+                .map(process -> {
+                    Long argumentId = callWASI(input, context, process);
+                    return doProcess(input, context, argumentId);
+                })
+                .orElseThrow(() -> new PulsarWasmException(
+                        PROCESS_METHOD_NAME + " function not found in " + super.getWasmName()));
+    }
+
+    private Long callWASI(X input,
+                          Context context,
+                          Extern process) {
+        // call WASI function
+        final Long argumentId = getArgumentId(input, context);
+        ARGUMENTS.put(argumentId, new Argument<>(input, context));

Review Comment:
   Do you have any idea what is the performance implications for this step? Comparing normal java function throughput compared with this WASM function?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [improve][pip] PIP-336: WASM Function API [pulsar]

Posted by "loongs-zhang (via GitHub)" <gi...@apache.org>.
loongs-zhang commented on code in PR #21992:
URL: https://github.com/apache/pulsar/pull/21992#discussion_r1493970517


##########
pip/pip-331.md:
##########
@@ -0,0 +1,129 @@
+# PIP-331: WASM Function API
+
+# Background knowledge
+
+WASM(WebAssembly) bytecode is designed to be encoded in a size- and load-time-efficient binary format. WASM aims to leverage the common hardware features available on various platforms to execute in browsers at machine code speed.
+
+WASI(WebAssembly System Interface) provide a portable interface for applications that run within a constrained sandbox environment, which allows WASM to run in non browser environments such as Linux. It's portable and secure.
+
+# Motivation
+
+The server and client sides of the Pulsar function use protobuf for decoupling. In principle, the language supported by protobuf can be supported by the pulsar function, now Pulsar provided the java, python and golang function client, but there are still many languages that are not supported.
+
+Before all language adaptations are completed (and it's almost entirely certain to be impossible), users cannot write pulsar function in their familiar languages.
+
+# Goals
+
+## In Scope
+
+Other languages, as long as their code can be compiled into WASM bytecode (such as Rust/golang/C++), users can use these languages to write pulsar function.
+
+## Out of Scope
+
+All existing abilities of the Java pulsar function client are not reimplemented, the WASM Pulsar functions is under the Java Pulsar functions.
+
+Due to the strict requirements of WASM on parameter types and for simplicity reasons, types other than `java.lang.Long` are not used as parameters or return value.
+
+# High Level Design
+
+```mermaid 
+flowchart LR;
+
+    subgraph develop
+        direction TB
+        SourceCode ==> |"CompileToWASM"| WasmFile ==> |"RenameFile"| MoveToTheResourceDirectory ==> UnitTest
+    end
+
+    subgraph runtime
+        direction TB
+        PulsarFunctionJava ==> |"LoadFromResource"| TheWasmFile ==> |"Invoke"| TheSourceCode
+    end
+    
+    develop --> runtime
+```
+
+# Detailed Design
+
+## Design & Implementation Details
+
+1. add `WasmLoader` to load WASM file and provide the WASM function to java, also provide the java function to WASM if we need.
+
+2. add `AbstractWasmFunction` and `AbstractWasmWindowFunction` as the core interface of the WASM function api.
+
+```java
+public abstract class AbstractWasmFunction<X, T> extends WasmLoader implements Function<X, T> {
+
+    private static final String PROCESS_METHOD_NAME = "process";
+
+    protected static final String INITIALIZE_METHOD_NAME = "initialize";
+
+    protected static final String CLOSE_METHOD_NAME = "close";
+
+    protected static final Map<Long, Argument<?>> ARGUMENTS = new ConcurrentHashMap<>();
+
+    @Override
+    public T process(X input, Context context) {
+        return super.getWasmExtern(PROCESS_METHOD_NAME)
+                .map(process -> {
+                    Long argumentId = callWASI(input, context, process);
+                    return doProcess(input, context, argumentId);
+                })
+                .orElseThrow(() -> new PulsarWasmException(
+                        PROCESS_METHOD_NAME + " function not found in " + super.getWasmName()));
+    }
+
+    private Long callWASI(X input,
+                          Context context,
+                          Extern process) {
+        // call WASI function
+        final Long argumentId = getArgumentId(input, context);
+        ARGUMENTS.put(argumentId, new Argument<>(input, context));

Review Comment:
   > Can you please elaborate more on how this handover is done?
   
   Java passes parameter IDs(`java.lang.Long`) to the other language, and then the other language obtain the raw parameters through the parameter ID.
   
   > Can WASM functions have access to Java variables? How?
   
   step1: export java functions to the other language
   
   ```java
   public class WasmFunction extends AbstractWasmFunction<String, String> {
   
       private static final Map<Long, String> RESULTS = new ConcurrentHashMap<>();
   
       @Override
       protected Map<String, Func> initWasmCallJavaFunc(Store<Void> store) {
           Map<String, Func> funcMap = new HashMap<>();
           funcMap.put("get_args",
                   WasmFunctions.wrap(store, WasmValType.I64, WasmValType.I64, WasmValType.I32, WasmValType.I32,
                           (argId, addr, len) -> {
                               String config = "hello from java " + argId;
                               System.out.println("java side->" + config);
                               ByteBuffer buf = super.getBuffer();
                               for (int i = 0; i < len && i < config.length(); i++) {
                                   buf.put(addr.intValue() + i, (byte) config.charAt(i));
                               }
                               return Math.min(config.length(), len);
                           }));
           //......
           return funcMap;
       }
   }
   ```
   
   step2: java call WASM, the `AbstractWasmFunction#process`
   
   step3: call to WASM, which is the following code, this is a rust(rust can be compiled into WASM) example.
   
   ```rust
   #[no_mangle]
   pub unsafe extern "C" fn process(arg_id: i64) {
       //......
   }
   ```
   
   step4: java pass args to the other language
   
   ```rust
   #[link(wasm_import_module = "pulsar")]
   extern "C" {
       fn get_args(arg_id: i64, addr: i64, len: i32) -> i32;
   }
   
   #[no_mangle]
   pub unsafe extern "C" fn process(arg_id: i64) {
       let mut buf = [0u8; 32];
       let buf_ptr = buf.as_mut_ptr() as i64;
       // get arg from java
       let len = get_args(arg_id, buf_ptr, buf.len() as i32);
       let java_arg = std::str::from_utf8(&buf[..len as usize]).unwrap();
       eprintln!("rust side-> recv:{}", java_arg);
       // ......
   }
   ```
   
   > Can the variables be of any type?
   
   No, see the supported types in `io.github.kawamuray.wasmtime.WasmValType`
   
   > How do you map a Java class to a WASM class?
   
   Unable to directly map, requires serialization and deserialization.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org