You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Steven van Rossum <sj...@google.com> on 2022/06/14 21:22:22 UTC

Fun with WebAssembly transforms

Hi folks,

I had some spare time yesterday and thought it'd be fun to implement a
transform which runs WebAssembly modules as a lightweight way to implement
cross language transforms for languages which don't (yet) have a SDK
implementation.

I've got a small proof of concept running in the Python SDK as a DoFn with
Wasmer as the WebAssembly runtime and simple support for marshalling
between the host and guest environment with the RowCoder. The module I've
constructed is mostly useless, but demonstrates the host copying the
encoded element into the guest's memory, the guest copying those bytes
elsewhere in its linear memory buffer, the guest calling back to the host
with the offset and size and the host copying and decoding from the guest's
memory.

Any thoughts/interest? I'm not sure where I was going with this, since it
was mostly just a "wouldn't it be cool if..." on a Monday afternoon, but I
can see a few use cases for this.

Regards,

Steve

Steven van Rossum |  Strategic Cloud Engineer |
sjvanrossum@google.com |  (+31)
(0)6 21174069

*Google Netherlands B.V.*





*Reg: Claude Debussylaan 34 15th floor, 1082 MD
Amsterdam34198589NETHERLANDSVAT / Tax ID:- 812788515 B01*


*If you received this communication by mistake, please don't forward it to
anyone else (it may contain confidential or privileged information), please
erase all copies of it, including all attachments, and please let the
sender know it went to the wrong person. Thanks.*

*The above terms reflect a potential business arrangement, are provided
solely as a basis for further discussion, and are not intended to be and do
not constitute a legally binding obligation. No legally binding obligations
will be created, implied, or inferred until an agreement in final form is
executed in writing by all parties involved.*

Re: Fun with WebAssembly transforms

Posted by Luke Cwik via dev <de...@beam.apache.org>.
Thanks Cham, I wasn't up to speed as to where Xlang was wrt to those
transforms.

On Wed, Jul 13, 2022 at 9:32 PM Chamikara Jayalath <ch...@google.com>
wrote:

> +1 and this is exactly what I suggested as well. Python Dataframe,
> RunInference, Python Map are available via x-lang for Java already [1] and
> all three need/use simple UDFs to customize operation. There is some logic
> that needs to be added to use Python transforms from Go SDK. As you
> suggested there are many Java x-lang transforms that can use simple UDF
> support as well. Either language combination should work to implement a
> first proof of concept for WASI support while also addressing an existing
> limitation.
>
> Thanks,
> Cham
>
> [1]
> https://github.com/apache/beam/tree/master/sdks/java/extensions/python/src/main/java/org/apache/beam/sdk/extensions/python/transforms
>
> On Wed, Jul 13, 2022 at 8:26 PM Kenneth Knowles <ke...@apache.org> wrote:
>
>> I agree with Luke. Targeting little helper UDFs that go along with IOs
>> are actually a major feature gap for xlang - like timestamp extractors that
>> have to parse particular data formats. This could be a very useful place to
>> try out the design options. I think we can simplify the problem by
>> insisting that they are pure functions that do not access state or side
>> inputs.
>>
>> On Wed, Jul 13, 2022 at 7:52 PM Luke Cwik via dev <de...@beam.apache.org>
>> wrote:
>>
>>> I think an easier target would be to support things like
>>> DynamicDestinations for Java IO connectors that are exposed as XLang for
>>> Go/Python <https://goto.google.com/Python>.
>>>
>>> This is because Go/Python <https://goto.google.com/Python> have good
>>> transpiling support to WebAssembly and we already exposed several Java IO
>>> XLang connectors already so its about plumbing one more thing through for
>>> these IO connectors.
>>>
>>> What interface should we expect for UDFs / UDAFs and should they be
>>> purpose oriented or should we do something like we did for portability
>>> where we have a graph of transforms that we feed arbitrary data in/out
>>> from. The latter would have the benefit of allowing the runner to embed the
>>> language execution directly within the runner and would pay the Wasm
>>> communication tax instead of the gRPC communication tax. If we do the
>>> former we still have the same issues where we have to be able to have a
>>> type system to pass information between the host system and the transpiled
>>> WebAssembly code that wraps the users UDF/UDAF and what if the UDF wants
>>> access to side inputs or user state ...
>>>
>>> On Wed, Jul 13, 2022 at 4:09 PM Chamikara Jayalath <ch...@google.com>
>>> wrote:
>>>
>>>>
>>>>
>>>> On Wed, Jul 13, 2022 at 9:31 AM Luke Cwik <lc...@google.com> wrote:
>>>>
>>>>> First we'll want to choose whether we want to target Wasm, WASI or
>>>>> Wagi.
>>>>>
>>>>
>>>> These terms are defined here
>>>> <https://www.fermyon.com/blog/wasm-wasi-wagi?gclid=CjwKCAjw2rmWBhB4EiwAiJ0mtVhiTuMZmy4bJSlk4nJj1deNX3KueomLgkG8JMyGeiHJ3FJRPpVn7BoCs58QAvD_BwE>
>>>> if anybody is confused as I am :)
>>>>
>>>>
>>>>> WASI adds a lot of simple things like access to a clock, random number
>>>>> generator, ... that would expand the scope of what transpiled code can do.
>>>>> It is debatable whether we'll want the power to run the transpiled code as
>>>>> a microservice. Using UDFs for XLang and UDFs and UDAFs for SQL as our
>>>>> expected use cases seem to make WASI the best choice. The issue is in the
>>>>> details as there is a hodgepodge of what language runtimes support and what
>>>>> are the limits of transpiling from a language to WebAssembly.
>>>>>
>>>>
>>>> Agree that WASI seems like a good target since it gives access to
>>>> additional system resources/tooling.
>>>>
>>>>
>>>>>
>>>>> Assuming WASI then it breaks down to these two aspects:
>>>>> 1) Does the host language have a runtime?
>>>>> Java: https://github.com/wasmerio/wasmer-java
>>>>> Python: https://github.com/wasmerio/wasmer-python
>>>>> Go: https://github.com/wasmerio/wasmer-go
>>>>>
>>>>> 2) How good is compilation from source language to WebAssembly
>>>>> <https://github.com/appcypher/awesome-wasm-langs>?
>>>>> Java (very limited):
>>>>> Issues with garbage collection and the need to transpile/replace much
>>>>> of the VM's capabilities plus the large standard library that everyone uses
>>>>> causes a lot of challenges.
>>>>> JWebAssembly can do simple things like basic classes, strings, method
>>>>> calls. Should be able to compile trivial lambdas to Wasm. There are other
>>>>> choices but to my knowledge all are very limited.
>>>>>
>>>>
>>>> That's unfortunate. But hopefully Java support will be implemented soon
>>>> ?
>>>>
>>>>
>>>>>
>>>>> Python <https://pythondev.readthedocs.io/wasm.html> (quite good):
>>>>> Features CPython Emscripten browser CPython Emscripten node Pyodide
>>>>> subprocess (fork, exec) no no no
>>>>> threads no YES WIP
>>>>> file system no (only MEMFS) YES (Node raw FS) YES (IDB, Node, …)
>>>>> shared extension modules WIP WIP YES
>>>>> PyPI packages no no YES
>>>>> sockets ? ? ?
>>>>> urllib, asyncio no no WebAPI fetch / WebSocket
>>>>> signals no WIP YES
>>>>>
>>>>> Go (excellent): Native support in go compiler
>>>>>
>>>>
>>>> Great. Could executing Go UDFs in Python x-lang transforms (for
>>>> example, Dataframe, RunInference, Python Map) be a good first target ?
>>>>
>>>> Thanks,
>>>> Cham
>>>>
>>>>
>>>>>
>>>>> On Tue, Jul 12, 2022 at 5:51 PM Chamikara Jayalath via dev <
>>>>> dev@beam.apache.org> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Jun 29, 2022 at 9:31 AM Luke Cwik <lc...@google.com> wrote:
>>>>>>
>>>>>>> I have had interest in integrating Wasm within Beam as well as I
>>>>>>> have had a lot of interest in improving language portability.
>>>>>>>
>>>>>>> Wasm has a lot of benefits over using docker containers to provide a
>>>>>>> place for code to execute. From experience implementing working on the
>>>>>>> Beam's portability layer and internal Flume knowledge:
>>>>>>> * encoding and decoding data is expensive, anything which ensures
>>>>>>> that in-memory representations for data being transferred from the host to
>>>>>>> the guest and back without transcoding/re-interpreting will be a big win.
>>>>>>> * reducing the amount of times we need to pass data between guest
>>>>>>> and host and back is important
>>>>>>>   * fusing transforms reduces the number of data passing points
>>>>>>>   * batching (row or columnar) data reduces the amount of times we
>>>>>>> need to pass data at each data passing point
>>>>>>> * there are enough complicated use cases (state & timers, large
>>>>>>> iterables, side inputs) where handling the trivial map/flatmap usecase will
>>>>>>> provide little value since it will prevent fusion
>>>>>>>
>>>>>>> I have been meaning to work on a prototype where we replace the
>>>>>>> current gRPC + docker path with one in which we use Wasm to execute a fused
>>>>>>> graph re-using large parts of the existing code base written to support
>>>>>>> portability.
>>>>>>>
>>>>>>
>>>>>> This sounds very interesting. Probably using Wasm to implement proper
>>>>>> UDF support for x-lang (for example, executing Python timestamp/watermark
>>>>>> functions provided through the Kafka Python x-lang wrapper on the Java
>>>>>> Kafka transform) will be a good first target ? My main question for this at
>>>>>> this point is whether Wasm has adequate support for existing SDKs that use
>>>>>> x-lang to implement this in a useful way.
>>>>>>
>>>>>> Thanks,
>>>>>> Cham
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Jun 17, 2022 at 2:19 PM Brian Hulette <bh...@google.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Re: Arrow - it's long been my dream to use Arrow for interchange in
>>>>>>>> Beam [1]. I'm trying to move us in that direction with
>>>>>>>> https://s.apache.org/batched-dofns (arrow is discussed briefly in
>>>>>>>> the Future Work section). This gives the Python SDK a concept of batches of
>>>>>>>> logical elements. My goal is Beam schemas + batches of logical elements ->
>>>>>>>> Arrow RecordBatches.
>>>>>>>>
>>>>>>>> The Batched DoFn infrastructure is stable as of the 2.40.0 release
>>>>>>>> cut and I'm currently working on adding what I'm calling a "BatchConverter"
>>>>>>>> [2] for Beam Rows -> Arrow RecordBatch. Once that's done it could be
>>>>>>>> interesting to experiment with a "WasmDoFn" that uses Arrow for interchange.
>>>>>>>>
>>>>>>>> Brian
>>>>>>>>
>>>>>>>> [1]
>>>>>>>> https://docs.google.com/presentation/d/1D9vigwYTCuAuz_CO8nex3GK3h873acmQJE5Ui8TFsDY/edit#slide=id.g608e662464_0_160
>>>>>>>> [2]
>>>>>>>> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/typehints/batch.py
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Jun 16, 2022 at 10:55 AM Sean Jensen-Grey <
>>>>>>>> jensengrey@google.com> wrote:
>>>>>>>>
>>>>>>>>> Interesting.
>>>>>>>>>
>>>>>>>>> Robert, I was just served an ad for Redpanda when I searched for
>>>>>>>>> "golang wasm" :)
>>>>>>>>>
>>>>>>>>> The storage and execution grid systems are all embracing wasm in
>>>>>>>>> some way.
>>>>>>>>>
>>>>>>>>> https://redpanda.com/
>>>>>>>>> https://www.fluvio.io/
>>>>>>>>> https://temporal.io/ (Cadence fork by the Cadence folks, I met
>>>>>>>>> Maxim the lead at Temporal at the 2020 Wasm Summit)
>>>>>>>>> https://github.com/pachyderm/pachyderm no mention of wasm, yet.
>>>>>>>>>
>>>>>>>>> Keep the Wasm+Beam demos coming.
>>>>>>>>>
>>>>>>>>> Sean
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Thu, Jun 16, 2022 at 4:23 AM Steven van Rossum <
>>>>>>>>> sjvanrossum@google.com> wrote:
>>>>>>>>>
>>>>>>>>>> I caught up with all the replies through the web interface, but I
>>>>>>>>>> didn't have my list subscription set up correctly so my reply (TL;DR sample
>>>>>>>>>> code available at https://github.com/sjvanrossum/beam-wasm)
>>>>>>>>>> didn't come through until a bit later yesterday I think.
>>>>>>>>>>
>>>>>>>>>> Sean, I agree with your suggestion of Arrow as the interchange
>>>>>>>>>> format for Wasm transforms and it's something I thought about exploring
>>>>>>>>>> when I was adding serialization/deserialization of complex (meaning
>>>>>>>>>> anything that's not an integer or float in the context of Wasm) data types
>>>>>>>>>> in the demo. It's an unfortunate bit of overhead which could very well be
>>>>>>>>>> solved with Arrow and shared memory between Wasm modules.
>>>>>>>>>> I've seen Wasm transforms pop up in a few other places, notably
>>>>>>>>>> in streaming data platforms like Fluvio and Redpanda and they seem to incur
>>>>>>>>>> the same overhead when moving data into and out of the guest context so
>>>>>>>>>> maybe it's negligible, but I haven't done any serious benchmark yet to
>>>>>>>>>> validate that.
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>>
>>>>>>>>>> Steve
>>>>>>>>>>
>>>>>>>>>> On Thu, Jun 16, 2022 at 3:04 AM Robert Burke <ro...@frantil.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Obligatory mention that WASM is basically an architecture that
>>>>>>>>>>> any well meaning compiler can target, eg the Go compiler
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> https://www.bradcypert.com/an-introduction-to-targeting-web-assembly-with-golang/
>>>>>>>>>>>
>>>>>>>>>>> (Among many articles for the last few years)
>>>>>>>>>>>
>>>>>>>>>>> Robert Burke
>>>>>>>>>>> Beam Go Busybody
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Jun 15, 2022, 2:04 PM Sean Jensen-Grey <
>>>>>>>>>>> jensengrey@google.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Heh, my stage fright was so strong, I didn't realize that the
>>>>>>>>>>>> talk was recorded. :)
>>>>>>>>>>>>
>>>>>>>>>>>> Steven, I'd love to chat about Wasm in Beam. This email is a
>>>>>>>>>>>> bit rough.
>>>>>>>>>>>>
>>>>>>>>>>>> I haven't explored Wasm in Beam much since that talk. I think
>>>>>>>>>>>> the most compelling use is in the portability of logic between data
>>>>>>>>>>>> processing systems. Esp in the use of probabilistic data structures like
>>>>>>>>>>>> Bloom Filters, Count-Min-Sketch, HyperLogLog, where it is nice to
>>>>>>>>>>>> persist the data structure and use it on a different system. Like
>>>>>>>>>>>> generating a bloom filter in Beam and using it inside of a BQ query w/o
>>>>>>>>>>>> having to reimplement and test across many platforms.
>>>>>>>>>>>>
>>>>>>>>>>>> I have used Wasm in BQ, as BQ UDFs are driven by V8. Anywhere
>>>>>>>>>>>> V8 exists, Wasm support exists for free unless the embedder goes out of
>>>>>>>>>>>> their way to disable it. So it is supported in Deno/Node as well. In
>>>>>>>>>>>> Python, Wasm support via Wasmtime
>>>>>>>>>>>> <https://github.com/bytecodealliance/wasmtime> is really
>>>>>>>>>>>> good.  There are *many* options for execution environments, one of the
>>>>>>>>>>>> downsides of passing through JS one is in string and number
>>>>>>>>>>>> support(float/int64) issues, afaik. I could be wrong, maybe JS has fixed
>>>>>>>>>>>> all this by now.
>>>>>>>>>>>>
>>>>>>>>>>>> The qualities in order of importance (for me) are
>>>>>>>>>>>>
>>>>>>>>>>>>    1. Portability, run the same code everywhere
>>>>>>>>>>>>    2. Security, memory safety for the caller. Running Wasm
>>>>>>>>>>>>    inside of Python should never crash your Python interpreter. The capability
>>>>>>>>>>>>    model ensures that the Wasm module can only do what you allow it to
>>>>>>>>>>>>    3. Performance (portable), compile once and run everywhere
>>>>>>>>>>>>    within some margin of native.  Python makes this look good :)
>>>>>>>>>>>>
>>>>>>>>>>>> I think something worth exploring is moving opaque-ish Arrow
>>>>>>>>>>>> objects around via Beam, so that Beam is now mostly in the control plane
>>>>>>>>>>>> and computation happens in Wasm, this should reduce the serialization
>>>>>>>>>>>> overhead and also get Python out of the datapath.
>>>>>>>>>>>>
>>>>>>>>>>>> I see someone exploring Wasm+Arrow here,
>>>>>>>>>>>> https://github.com/domoritz/arrow-wasm
>>>>>>>>>>>>
>>>>>>>>>>>> Another possibly interesting avenue to explore is compiling
>>>>>>>>>>>> command line programs to Wasi (WebAssembly System Interface), the POSIX
>>>>>>>>>>>> like shim, so that they can be run inprocess without the fork/exec/pipe
>>>>>>>>>>>> overhead of running a subprocess. A neat demo might be running something
>>>>>>>>>>>> like Jq <https://stedolan.github.io/jq/> inside of a Beam job.
>>>>>>>>>>>>
>>>>>>>>>>>> Not to make Wasm sound like a Python only technology, it can be
>>>>>>>>>>>> used via Java/JVM via
>>>>>>>>>>>>
>>>>>>>>>>>>    - https://www.graalvm.org/22.1/reference-manual/wasm/
>>>>>>>>>>>>    - https://github.com/kawamuray/wasmtime-java
>>>>>>>>>>>>
>>>>>>>>>>>> Sean
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Jun 15, 2022 at 9:35 AM Pablo Estrada <
>>>>>>>>>>>> pabloem@google.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> adding Steven in case he didn't get the replies : )
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Wed, Jun 15, 2022 at 9:29 AM Daniel Collins <
>>>>>>>>>>>>> dpcollins@google.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> If we ever do anything with the JS runtime, this would seem
>>>>>>>>>>>>>> to be the best place to run WASM.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, Jun 14, 2022 at 8:13 PM Brian Hulette <
>>>>>>>>>>>>>> bhulette@google.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> FYI: @Sean Jensen-Grey <je...@google.com> gave a talk
>>>>>>>>>>>>>>> back in 2020 where he had integrated Rust with the Python SDK. I thought he
>>>>>>>>>>>>>>> used WebAssembly for that, but it looks like he used some other approaches,
>>>>>>>>>>>>>>> and his talk mentioned WebAssembly as future work. Not sure if that was
>>>>>>>>>>>>>>> ever explored.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> https://www.youtube.com/watch?v=fZK_Tiu7q1o
>>>>>>>>>>>>>>> https://github.com/seanjensengrey/beam-rust-python-java
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Brian
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Tue, Jun 14, 2022 at 5:05 PM Ahmet Altay <
>>>>>>>>>>>>>>> altay@google.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Adding @Lukasz Cwik <lc...@google.com> - he was interested
>>>>>>>>>>>>>>>> in the WebAssembly topic.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Tue, Jun 14, 2022 at 3:09 PM Pablo Estrada <
>>>>>>>>>>>>>>>> pabloem@google.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Would you open a pull request for it? Or at least share a
>>>>>>>>>>>>>>>>> branch? : )
>>>>>>>>>>>>>>>>> Even if we don't want to merge it, it would be great to
>>>>>>>>>>>>>>>>> have a PR as a way to showcase the work, its usefulness, and receive
>>>>>>>>>>>>>>>>> comments on this thread once we can see something more specific.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Tue, Jun 14, 2022 at 3:05 PM Steven van Rossum <
>>>>>>>>>>>>>>>>> sjvanrossum@google.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Hi folks,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I had some spare time yesterday and thought it'd be fun
>>>>>>>>>>>>>>>>>> to implement a transform which runs WebAssembly modules as a lightweight
>>>>>>>>>>>>>>>>>> way to implement cross language transforms for languages which don't (yet)
>>>>>>>>>>>>>>>>>> have a SDK implementation.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I've got a small proof of concept running in the Python
>>>>>>>>>>>>>>>>>> SDK as a DoFn with Wasmer as the WebAssembly runtime and simple support for
>>>>>>>>>>>>>>>>>> marshalling between the host and guest environment with the RowCoder. The
>>>>>>>>>>>>>>>>>> module I've constructed is mostly useless, but demonstrates the host
>>>>>>>>>>>>>>>>>> copying the encoded element into the guest's memory, the guest copying
>>>>>>>>>>>>>>>>>> those bytes elsewhere in its linear memory buffer, the guest calling back
>>>>>>>>>>>>>>>>>> to the host with the offset and size and the host copying and decoding from
>>>>>>>>>>>>>>>>>> the guest's memory.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Any thoughts/interest? I'm not sure where I was going
>>>>>>>>>>>>>>>>>> with this, since it was mostly just a "wouldn't it be cool if..." on a
>>>>>>>>>>>>>>>>>> Monday afternoon, but I can see a few use cases for this.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Steve
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Steven van Rossum |  Strategic Cloud Engineer |
>>>>>>>>>>>>>>>>>> sjvanrossum@google.com |  (+31) (0)6 21174069
>>>>>>>>>>>>>>>>>> <+31%206%2021174069>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> *Google Netherlands B.V.*
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> *Reg: Claude Debussylaan 34 15th floor, 1082 MD
>>>>>>>>>>>>>>>>>> Amsterdam34198589NETHERLANDSVAT / Tax ID:- 812788515 B01*
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> *If you received this communication by mistake, please
>>>>>>>>>>>>>>>>>> don't forward it to anyone else (it may contain confidential or privileged
>>>>>>>>>>>>>>>>>> information), please erase all copies of it, including all attachments, and
>>>>>>>>>>>>>>>>>> please let the sender know it went to the wrong person. Thanks.*
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> *The above terms reflect a potential business
>>>>>>>>>>>>>>>>>> arrangement, are provided solely as a basis for further discussion, and are
>>>>>>>>>>>>>>>>>> not intended to be and do not constitute a legally binding obligation. No
>>>>>>>>>>>>>>>>>> legally binding obligations will be created, implied, or inferred until an
>>>>>>>>>>>>>>>>>> agreement in final form is executed in writing by all parties involved.*
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>

Re: Fun with WebAssembly transforms

Posted by Chamikara Jayalath via dev <de...@beam.apache.org>.
+1 and this is exactly what I suggested as well. Python Dataframe,
RunInference, Python Map are available via x-lang for Java already [1] and
all three need/use simple UDFs to customize operation. There is some logic
that needs to be added to use Python transforms from Go SDK. As you
suggested there are many Java x-lang transforms that can use simple UDF
support as well. Either language combination should work to implement a
first proof of concept for WASI support while also addressing an existing
limitation.

Thanks,
Cham

[1]
https://github.com/apache/beam/tree/master/sdks/java/extensions/python/src/main/java/org/apache/beam/sdk/extensions/python/transforms

On Wed, Jul 13, 2022 at 8:26 PM Kenneth Knowles <ke...@apache.org> wrote:

> I agree with Luke. Targeting little helper UDFs that go along with IOs are
> actually a major feature gap for xlang - like timestamp extractors that
> have to parse particular data formats. This could be a very useful place to
> try out the design options. I think we can simplify the problem by
> insisting that they are pure functions that do not access state or side
> inputs.
>
> On Wed, Jul 13, 2022 at 7:52 PM Luke Cwik via dev <de...@beam.apache.org>
> wrote:
>
>> I think an easier target would be to support things like
>> DynamicDestinations for Java IO connectors that are exposed as XLang for
>> Go/Python <https://goto.google.com/Python>.
>>
>> This is because Go/Python <https://goto.google.com/Python> have good
>> transpiling support to WebAssembly and we already exposed several Java IO
>> XLang connectors already so its about plumbing one more thing through for
>> these IO connectors.
>>
>> What interface should we expect for UDFs / UDAFs and should they be
>> purpose oriented or should we do something like we did for portability
>> where we have a graph of transforms that we feed arbitrary data in/out
>> from. The latter would have the benefit of allowing the runner to embed the
>> language execution directly within the runner and would pay the Wasm
>> communication tax instead of the gRPC communication tax. If we do the
>> former we still have the same issues where we have to be able to have a
>> type system to pass information between the host system and the transpiled
>> WebAssembly code that wraps the users UDF/UDAF and what if the UDF wants
>> access to side inputs or user state ...
>>
>> On Wed, Jul 13, 2022 at 4:09 PM Chamikara Jayalath <ch...@google.com>
>> wrote:
>>
>>>
>>>
>>> On Wed, Jul 13, 2022 at 9:31 AM Luke Cwik <lc...@google.com> wrote:
>>>
>>>> First we'll want to choose whether we want to target Wasm, WASI or Wagi.
>>>>
>>>
>>> These terms are defined here
>>> <https://www.fermyon.com/blog/wasm-wasi-wagi?gclid=CjwKCAjw2rmWBhB4EiwAiJ0mtVhiTuMZmy4bJSlk4nJj1deNX3KueomLgkG8JMyGeiHJ3FJRPpVn7BoCs58QAvD_BwE>
>>> if anybody is confused as I am :)
>>>
>>>
>>>> WASI adds a lot of simple things like access to a clock, random number
>>>> generator, ... that would expand the scope of what transpiled code can do.
>>>> It is debatable whether we'll want the power to run the transpiled code as
>>>> a microservice. Using UDFs for XLang and UDFs and UDAFs for SQL as our
>>>> expected use cases seem to make WASI the best choice. The issue is in the
>>>> details as there is a hodgepodge of what language runtimes support and what
>>>> are the limits of transpiling from a language to WebAssembly.
>>>>
>>>
>>> Agree that WASI seems like a good target since it gives access to
>>> additional system resources/tooling.
>>>
>>>
>>>>
>>>> Assuming WASI then it breaks down to these two aspects:
>>>> 1) Does the host language have a runtime?
>>>> Java: https://github.com/wasmerio/wasmer-java
>>>> Python: https://github.com/wasmerio/wasmer-python
>>>> Go: https://github.com/wasmerio/wasmer-go
>>>>
>>>> 2) How good is compilation from source language to WebAssembly
>>>> <https://github.com/appcypher/awesome-wasm-langs>?
>>>> Java (very limited):
>>>> Issues with garbage collection and the need to transpile/replace much
>>>> of the VM's capabilities plus the large standard library that everyone uses
>>>> causes a lot of challenges.
>>>> JWebAssembly can do simple things like basic classes, strings, method
>>>> calls. Should be able to compile trivial lambdas to Wasm. There are other
>>>> choices but to my knowledge all are very limited.
>>>>
>>>
>>> That's unfortunate. But hopefully Java support will be implemented soon ?
>>>
>>>
>>>>
>>>> Python <https://pythondev.readthedocs.io/wasm.html> (quite good):
>>>> Features CPython Emscripten browser CPython Emscripten node Pyodide
>>>> subprocess (fork, exec) no no no
>>>> threads no YES WIP
>>>> file system no (only MEMFS) YES (Node raw FS) YES (IDB, Node, …)
>>>> shared extension modules WIP WIP YES
>>>> PyPI packages no no YES
>>>> sockets ? ? ?
>>>> urllib, asyncio no no WebAPI fetch / WebSocket
>>>> signals no WIP YES
>>>>
>>>> Go (excellent): Native support in go compiler
>>>>
>>>
>>> Great. Could executing Go UDFs in Python x-lang transforms (for example,
>>> Dataframe, RunInference, Python Map) be a good first target ?
>>>
>>> Thanks,
>>> Cham
>>>
>>>
>>>>
>>>> On Tue, Jul 12, 2022 at 5:51 PM Chamikara Jayalath via dev <
>>>> dev@beam.apache.org> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Wed, Jun 29, 2022 at 9:31 AM Luke Cwik <lc...@google.com> wrote:
>>>>>
>>>>>> I have had interest in integrating Wasm within Beam as well as I have
>>>>>> had a lot of interest in improving language portability.
>>>>>>
>>>>>> Wasm has a lot of benefits over using docker containers to provide a
>>>>>> place for code to execute. From experience implementing working on the
>>>>>> Beam's portability layer and internal Flume knowledge:
>>>>>> * encoding and decoding data is expensive, anything which ensures
>>>>>> that in-memory representations for data being transferred from the host to
>>>>>> the guest and back without transcoding/re-interpreting will be a big win.
>>>>>> * reducing the amount of times we need to pass data between guest and
>>>>>> host and back is important
>>>>>>   * fusing transforms reduces the number of data passing points
>>>>>>   * batching (row or columnar) data reduces the amount of times we
>>>>>> need to pass data at each data passing point
>>>>>> * there are enough complicated use cases (state & timers, large
>>>>>> iterables, side inputs) where handling the trivial map/flatmap usecase will
>>>>>> provide little value since it will prevent fusion
>>>>>>
>>>>>> I have been meaning to work on a prototype where we replace the
>>>>>> current gRPC + docker path with one in which we use Wasm to execute a fused
>>>>>> graph re-using large parts of the existing code base written to support
>>>>>> portability.
>>>>>>
>>>>>
>>>>> This sounds very interesting. Probably using Wasm to implement proper
>>>>> UDF support for x-lang (for example, executing Python timestamp/watermark
>>>>> functions provided through the Kafka Python x-lang wrapper on the Java
>>>>> Kafka transform) will be a good first target ? My main question for this at
>>>>> this point is whether Wasm has adequate support for existing SDKs that use
>>>>> x-lang to implement this in a useful way.
>>>>>
>>>>> Thanks,
>>>>> Cham
>>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Jun 17, 2022 at 2:19 PM Brian Hulette <bh...@google.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Re: Arrow - it's long been my dream to use Arrow for interchange in
>>>>>>> Beam [1]. I'm trying to move us in that direction with
>>>>>>> https://s.apache.org/batched-dofns (arrow is discussed briefly in
>>>>>>> the Future Work section). This gives the Python SDK a concept of batches of
>>>>>>> logical elements. My goal is Beam schemas + batches of logical elements ->
>>>>>>> Arrow RecordBatches.
>>>>>>>
>>>>>>> The Batched DoFn infrastructure is stable as of the 2.40.0 release
>>>>>>> cut and I'm currently working on adding what I'm calling a "BatchConverter"
>>>>>>> [2] for Beam Rows -> Arrow RecordBatch. Once that's done it could be
>>>>>>> interesting to experiment with a "WasmDoFn" that uses Arrow for interchange.
>>>>>>>
>>>>>>> Brian
>>>>>>>
>>>>>>> [1]
>>>>>>> https://docs.google.com/presentation/d/1D9vigwYTCuAuz_CO8nex3GK3h873acmQJE5Ui8TFsDY/edit#slide=id.g608e662464_0_160
>>>>>>> [2]
>>>>>>> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/typehints/batch.py
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Jun 16, 2022 at 10:55 AM Sean Jensen-Grey <
>>>>>>> jensengrey@google.com> wrote:
>>>>>>>
>>>>>>>> Interesting.
>>>>>>>>
>>>>>>>> Robert, I was just served an ad for Redpanda when I searched for
>>>>>>>> "golang wasm" :)
>>>>>>>>
>>>>>>>> The storage and execution grid systems are all embracing wasm in
>>>>>>>> some way.
>>>>>>>>
>>>>>>>> https://redpanda.com/
>>>>>>>> https://www.fluvio.io/
>>>>>>>> https://temporal.io/ (Cadence fork by the Cadence folks, I met
>>>>>>>> Maxim the lead at Temporal at the 2020 Wasm Summit)
>>>>>>>> https://github.com/pachyderm/pachyderm no mention of wasm, yet.
>>>>>>>>
>>>>>>>> Keep the Wasm+Beam demos coming.
>>>>>>>>
>>>>>>>> Sean
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Jun 16, 2022 at 4:23 AM Steven van Rossum <
>>>>>>>> sjvanrossum@google.com> wrote:
>>>>>>>>
>>>>>>>>> I caught up with all the replies through the web interface, but I
>>>>>>>>> didn't have my list subscription set up correctly so my reply (TL;DR sample
>>>>>>>>> code available at https://github.com/sjvanrossum/beam-wasm)
>>>>>>>>> didn't come through until a bit later yesterday I think.
>>>>>>>>>
>>>>>>>>> Sean, I agree with your suggestion of Arrow as the interchange
>>>>>>>>> format for Wasm transforms and it's something I thought about exploring
>>>>>>>>> when I was adding serialization/deserialization of complex (meaning
>>>>>>>>> anything that's not an integer or float in the context of Wasm) data types
>>>>>>>>> in the demo. It's an unfortunate bit of overhead which could very well be
>>>>>>>>> solved with Arrow and shared memory between Wasm modules.
>>>>>>>>> I've seen Wasm transforms pop up in a few other places, notably in
>>>>>>>>> streaming data platforms like Fluvio and Redpanda and they seem to incur
>>>>>>>>> the same overhead when moving data into and out of the guest context so
>>>>>>>>> maybe it's negligible, but I haven't done any serious benchmark yet to
>>>>>>>>> validate that.
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>>
>>>>>>>>> Steve
>>>>>>>>>
>>>>>>>>> On Thu, Jun 16, 2022 at 3:04 AM Robert Burke <ro...@frantil.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Obligatory mention that WASM is basically an architecture that
>>>>>>>>>> any well meaning compiler can target, eg the Go compiler
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> https://www.bradcypert.com/an-introduction-to-targeting-web-assembly-with-golang/
>>>>>>>>>>
>>>>>>>>>> (Among many articles for the last few years)
>>>>>>>>>>
>>>>>>>>>> Robert Burke
>>>>>>>>>> Beam Go Busybody
>>>>>>>>>>
>>>>>>>>>> On Wed, Jun 15, 2022, 2:04 PM Sean Jensen-Grey <
>>>>>>>>>> jensengrey@google.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Heh, my stage fright was so strong, I didn't realize that the
>>>>>>>>>>> talk was recorded. :)
>>>>>>>>>>>
>>>>>>>>>>> Steven, I'd love to chat about Wasm in Beam. This email is a bit
>>>>>>>>>>> rough.
>>>>>>>>>>>
>>>>>>>>>>> I haven't explored Wasm in Beam much since that talk. I think
>>>>>>>>>>> the most compelling use is in the portability of logic between data
>>>>>>>>>>> processing systems. Esp in the use of probabilistic data structures like
>>>>>>>>>>> Bloom Filters, Count-Min-Sketch, HyperLogLog, where it is nice to
>>>>>>>>>>> persist the data structure and use it on a different system. Like
>>>>>>>>>>> generating a bloom filter in Beam and using it inside of a BQ query w/o
>>>>>>>>>>> having to reimplement and test across many platforms.
>>>>>>>>>>>
>>>>>>>>>>> I have used Wasm in BQ, as BQ UDFs are driven by V8. Anywhere V8
>>>>>>>>>>> exists, Wasm support exists for free unless the embedder goes out of their
>>>>>>>>>>> way to disable it. So it is supported in Deno/Node as well. In Python, Wasm
>>>>>>>>>>> support via Wasmtime
>>>>>>>>>>> <https://github.com/bytecodealliance/wasmtime> is really good.
>>>>>>>>>>> There are *many* options for execution environments, one of the downsides
>>>>>>>>>>> of passing through JS one is in string and number support(float/int64)
>>>>>>>>>>> issues, afaik. I could be wrong, maybe JS has fixed all this by now.
>>>>>>>>>>>
>>>>>>>>>>> The qualities in order of importance (for me) are
>>>>>>>>>>>
>>>>>>>>>>>    1. Portability, run the same code everywhere
>>>>>>>>>>>    2. Security, memory safety for the caller. Running Wasm
>>>>>>>>>>>    inside of Python should never crash your Python interpreter. The capability
>>>>>>>>>>>    model ensures that the Wasm module can only do what you allow it to
>>>>>>>>>>>    3. Performance (portable), compile once and run everywhere
>>>>>>>>>>>    within some margin of native.  Python makes this look good :)
>>>>>>>>>>>
>>>>>>>>>>> I think something worth exploring is moving opaque-ish Arrow
>>>>>>>>>>> objects around via Beam, so that Beam is now mostly in the control plane
>>>>>>>>>>> and computation happens in Wasm, this should reduce the serialization
>>>>>>>>>>> overhead and also get Python out of the datapath.
>>>>>>>>>>>
>>>>>>>>>>> I see someone exploring Wasm+Arrow here,
>>>>>>>>>>> https://github.com/domoritz/arrow-wasm
>>>>>>>>>>>
>>>>>>>>>>> Another possibly interesting avenue to explore is compiling
>>>>>>>>>>> command line programs to Wasi (WebAssembly System Interface), the POSIX
>>>>>>>>>>> like shim, so that they can be run inprocess without the fork/exec/pipe
>>>>>>>>>>> overhead of running a subprocess. A neat demo might be running something
>>>>>>>>>>> like Jq <https://stedolan.github.io/jq/> inside of a Beam job.
>>>>>>>>>>>
>>>>>>>>>>> Not to make Wasm sound like a Python only technology, it can be
>>>>>>>>>>> used via Java/JVM via
>>>>>>>>>>>
>>>>>>>>>>>    - https://www.graalvm.org/22.1/reference-manual/wasm/
>>>>>>>>>>>    - https://github.com/kawamuray/wasmtime-java
>>>>>>>>>>>
>>>>>>>>>>> Sean
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Jun 15, 2022 at 9:35 AM Pablo Estrada <
>>>>>>>>>>> pabloem@google.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> adding Steven in case he didn't get the replies : )
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Jun 15, 2022 at 9:29 AM Daniel Collins <
>>>>>>>>>>>> dpcollins@google.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> If we ever do anything with the JS runtime, this would seem to
>>>>>>>>>>>>> be the best place to run WASM.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Jun 14, 2022 at 8:13 PM Brian Hulette <
>>>>>>>>>>>>> bhulette@google.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> FYI: @Sean Jensen-Grey <je...@google.com> gave a talk
>>>>>>>>>>>>>> back in 2020 where he had integrated Rust with the Python SDK. I thought he
>>>>>>>>>>>>>> used WebAssembly for that, but it looks like he used some other approaches,
>>>>>>>>>>>>>> and his talk mentioned WebAssembly as future work. Not sure if that was
>>>>>>>>>>>>>> ever explored.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> https://www.youtube.com/watch?v=fZK_Tiu7q1o
>>>>>>>>>>>>>> https://github.com/seanjensengrey/beam-rust-python-java
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Brian
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, Jun 14, 2022 at 5:05 PM Ahmet Altay <al...@google.com>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Adding @Lukasz Cwik <lc...@google.com> - he was interested
>>>>>>>>>>>>>>> in the WebAssembly topic.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Tue, Jun 14, 2022 at 3:09 PM Pablo Estrada <
>>>>>>>>>>>>>>> pabloem@google.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Would you open a pull request for it? Or at least share a
>>>>>>>>>>>>>>>> branch? : )
>>>>>>>>>>>>>>>> Even if we don't want to merge it, it would be great to
>>>>>>>>>>>>>>>> have a PR as a way to showcase the work, its usefulness, and receive
>>>>>>>>>>>>>>>> comments on this thread once we can see something more specific.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Tue, Jun 14, 2022 at 3:05 PM Steven van Rossum <
>>>>>>>>>>>>>>>> sjvanrossum@google.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hi folks,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I had some spare time yesterday and thought it'd be fun to
>>>>>>>>>>>>>>>>> implement a transform which runs WebAssembly modules as a lightweight way
>>>>>>>>>>>>>>>>> to implement cross language transforms for languages which don't (yet) have
>>>>>>>>>>>>>>>>> a SDK implementation.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I've got a small proof of concept running in the Python
>>>>>>>>>>>>>>>>> SDK as a DoFn with Wasmer as the WebAssembly runtime and simple support for
>>>>>>>>>>>>>>>>> marshalling between the host and guest environment with the RowCoder. The
>>>>>>>>>>>>>>>>> module I've constructed is mostly useless, but demonstrates the host
>>>>>>>>>>>>>>>>> copying the encoded element into the guest's memory, the guest copying
>>>>>>>>>>>>>>>>> those bytes elsewhere in its linear memory buffer, the guest calling back
>>>>>>>>>>>>>>>>> to the host with the offset and size and the host copying and decoding from
>>>>>>>>>>>>>>>>> the guest's memory.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Any thoughts/interest? I'm not sure where I was going with
>>>>>>>>>>>>>>>>> this, since it was mostly just a "wouldn't it be cool if..." on a Monday
>>>>>>>>>>>>>>>>> afternoon, but I can see a few use cases for this.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Steve
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Steven van Rossum |  Strategic Cloud Engineer |
>>>>>>>>>>>>>>>>> sjvanrossum@google.com |  (+31) (0)6 21174069
>>>>>>>>>>>>>>>>> <+31%206%2021174069>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> *Google Netherlands B.V.*
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> *Reg: Claude Debussylaan 34 15th floor, 1082 MD
>>>>>>>>>>>>>>>>> Amsterdam34198589NETHERLANDSVAT / Tax ID:- 812788515 B01*
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> *If you received this communication by mistake, please
>>>>>>>>>>>>>>>>> don't forward it to anyone else (it may contain confidential or privileged
>>>>>>>>>>>>>>>>> information), please erase all copies of it, including all attachments, and
>>>>>>>>>>>>>>>>> please let the sender know it went to the wrong person. Thanks.*
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> *The above terms reflect a potential business arrangement,
>>>>>>>>>>>>>>>>> are provided solely as a basis for further discussion, and are not intended
>>>>>>>>>>>>>>>>> to be and do not constitute a legally binding obligation. No legally
>>>>>>>>>>>>>>>>> binding obligations will be created, implied, or inferred until an
>>>>>>>>>>>>>>>>> agreement in final form is executed in writing by all parties involved.*
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>

Re: Fun with WebAssembly transforms

Posted by Kenneth Knowles <ke...@apache.org>.
I agree with Luke. Targeting little helper UDFs that go along with IOs are
actually a major feature gap for xlang - like timestamp extractors that
have to parse particular data formats. This could be a very useful place to
try out the design options. I think we can simplify the problem by
insisting that they are pure functions that do not access state or side
inputs.

On Wed, Jul 13, 2022 at 7:52 PM Luke Cwik via dev <de...@beam.apache.org>
wrote:

> I think an easier target would be to support things like
> DynamicDestinations for Java IO connectors that are exposed as XLang for
> Go/Python <https://goto.google.com/Python>.
>
> This is because Go/Python <https://goto.google.com/Python> have good
> transpiling support to WebAssembly and we already exposed several Java IO
> XLang connectors already so its about plumbing one more thing through for
> these IO connectors.
>
> What interface should we expect for UDFs / UDAFs and should they be
> purpose oriented or should we do something like we did for portability
> where we have a graph of transforms that we feed arbitrary data in/out
> from. The latter would have the benefit of allowing the runner to embed the
> language execution directly within the runner and would pay the Wasm
> communication tax instead of the gRPC communication tax. If we do the
> former we still have the same issues where we have to be able to have a
> type system to pass information between the host system and the transpiled
> WebAssembly code that wraps the users UDF/UDAF and what if the UDF wants
> access to side inputs or user state ...
>
> On Wed, Jul 13, 2022 at 4:09 PM Chamikara Jayalath <ch...@google.com>
> wrote:
>
>>
>>
>> On Wed, Jul 13, 2022 at 9:31 AM Luke Cwik <lc...@google.com> wrote:
>>
>>> First we'll want to choose whether we want to target Wasm, WASI or Wagi.
>>>
>>
>> These terms are defined here
>> <https://www.fermyon.com/blog/wasm-wasi-wagi?gclid=CjwKCAjw2rmWBhB4EiwAiJ0mtVhiTuMZmy4bJSlk4nJj1deNX3KueomLgkG8JMyGeiHJ3FJRPpVn7BoCs58QAvD_BwE>
>> if anybody is confused as I am :)
>>
>>
>>> WASI adds a lot of simple things like access to a clock, random number
>>> generator, ... that would expand the scope of what transpiled code can do.
>>> It is debatable whether we'll want the power to run the transpiled code as
>>> a microservice. Using UDFs for XLang and UDFs and UDAFs for SQL as our
>>> expected use cases seem to make WASI the best choice. The issue is in the
>>> details as there is a hodgepodge of what language runtimes support and what
>>> are the limits of transpiling from a language to WebAssembly.
>>>
>>
>> Agree that WASI seems like a good target since it gives access to
>> additional system resources/tooling.
>>
>>
>>>
>>> Assuming WASI then it breaks down to these two aspects:
>>> 1) Does the host language have a runtime?
>>> Java: https://github.com/wasmerio/wasmer-java
>>> Python: https://github.com/wasmerio/wasmer-python
>>> Go: https://github.com/wasmerio/wasmer-go
>>>
>>> 2) How good is compilation from source language to WebAssembly
>>> <https://github.com/appcypher/awesome-wasm-langs>?
>>> Java (very limited):
>>> Issues with garbage collection and the need to transpile/replace much of
>>> the VM's capabilities plus the large standard library that everyone uses
>>> causes a lot of challenges.
>>> JWebAssembly can do simple things like basic classes, strings, method
>>> calls. Should be able to compile trivial lambdas to Wasm. There are other
>>> choices but to my knowledge all are very limited.
>>>
>>
>> That's unfortunate. But hopefully Java support will be implemented soon ?
>>
>>
>>>
>>> Python <https://pythondev.readthedocs.io/wasm.html> (quite good):
>>> Features CPython Emscripten browser CPython Emscripten node Pyodide
>>> subprocess (fork, exec) no no no
>>> threads no YES WIP
>>> file system no (only MEMFS) YES (Node raw FS) YES (IDB, Node, …)
>>> shared extension modules WIP WIP YES
>>> PyPI packages no no YES
>>> sockets ? ? ?
>>> urllib, asyncio no no WebAPI fetch / WebSocket
>>> signals no WIP YES
>>>
>>> Go (excellent): Native support in go compiler
>>>
>>
>> Great. Could executing Go UDFs in Python x-lang transforms (for example,
>> Dataframe, RunInference, Python Map) be a good first target ?
>>
>> Thanks,
>> Cham
>>
>>
>>>
>>> On Tue, Jul 12, 2022 at 5:51 PM Chamikara Jayalath via dev <
>>> dev@beam.apache.org> wrote:
>>>
>>>>
>>>>
>>>> On Wed, Jun 29, 2022 at 9:31 AM Luke Cwik <lc...@google.com> wrote:
>>>>
>>>>> I have had interest in integrating Wasm within Beam as well as I have
>>>>> had a lot of interest in improving language portability.
>>>>>
>>>>> Wasm has a lot of benefits over using docker containers to provide a
>>>>> place for code to execute. From experience implementing working on the
>>>>> Beam's portability layer and internal Flume knowledge:
>>>>> * encoding and decoding data is expensive, anything which ensures that
>>>>> in-memory representations for data being transferred from the host to the
>>>>> guest and back without transcoding/re-interpreting will be a big win.
>>>>> * reducing the amount of times we need to pass data between guest and
>>>>> host and back is important
>>>>>   * fusing transforms reduces the number of data passing points
>>>>>   * batching (row or columnar) data reduces the amount of times we
>>>>> need to pass data at each data passing point
>>>>> * there are enough complicated use cases (state & timers, large
>>>>> iterables, side inputs) where handling the trivial map/flatmap usecase will
>>>>> provide little value since it will prevent fusion
>>>>>
>>>>> I have been meaning to work on a prototype where we replace the
>>>>> current gRPC + docker path with one in which we use Wasm to execute a fused
>>>>> graph re-using large parts of the existing code base written to support
>>>>> portability.
>>>>>
>>>>
>>>> This sounds very interesting. Probably using Wasm to implement proper
>>>> UDF support for x-lang (for example, executing Python timestamp/watermark
>>>> functions provided through the Kafka Python x-lang wrapper on the Java
>>>> Kafka transform) will be a good first target ? My main question for this at
>>>> this point is whether Wasm has adequate support for existing SDKs that use
>>>> x-lang to implement this in a useful way.
>>>>
>>>> Thanks,
>>>> Cham
>>>>
>>>>
>>>>>
>>>>>
>>>>> On Fri, Jun 17, 2022 at 2:19 PM Brian Hulette <bh...@google.com>
>>>>> wrote:
>>>>>
>>>>>> Re: Arrow - it's long been my dream to use Arrow for interchange in
>>>>>> Beam [1]. I'm trying to move us in that direction with
>>>>>> https://s.apache.org/batched-dofns (arrow is discussed briefly in
>>>>>> the Future Work section). This gives the Python SDK a concept of batches of
>>>>>> logical elements. My goal is Beam schemas + batches of logical elements ->
>>>>>> Arrow RecordBatches.
>>>>>>
>>>>>> The Batched DoFn infrastructure is stable as of the 2.40.0 release
>>>>>> cut and I'm currently working on adding what I'm calling a "BatchConverter"
>>>>>> [2] for Beam Rows -> Arrow RecordBatch. Once that's done it could be
>>>>>> interesting to experiment with a "WasmDoFn" that uses Arrow for interchange.
>>>>>>
>>>>>> Brian
>>>>>>
>>>>>> [1]
>>>>>> https://docs.google.com/presentation/d/1D9vigwYTCuAuz_CO8nex3GK3h873acmQJE5Ui8TFsDY/edit#slide=id.g608e662464_0_160
>>>>>> [2]
>>>>>> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/typehints/batch.py
>>>>>>
>>>>>>
>>>>>> On Thu, Jun 16, 2022 at 10:55 AM Sean Jensen-Grey <
>>>>>> jensengrey@google.com> wrote:
>>>>>>
>>>>>>> Interesting.
>>>>>>>
>>>>>>> Robert, I was just served an ad for Redpanda when I searched for
>>>>>>> "golang wasm" :)
>>>>>>>
>>>>>>> The storage and execution grid systems are all embracing wasm in
>>>>>>> some way.
>>>>>>>
>>>>>>> https://redpanda.com/
>>>>>>> https://www.fluvio.io/
>>>>>>> https://temporal.io/ (Cadence fork by the Cadence folks, I met
>>>>>>> Maxim the lead at Temporal at the 2020 Wasm Summit)
>>>>>>> https://github.com/pachyderm/pachyderm no mention of wasm, yet.
>>>>>>>
>>>>>>> Keep the Wasm+Beam demos coming.
>>>>>>>
>>>>>>> Sean
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Jun 16, 2022 at 4:23 AM Steven van Rossum <
>>>>>>> sjvanrossum@google.com> wrote:
>>>>>>>
>>>>>>>> I caught up with all the replies through the web interface, but I
>>>>>>>> didn't have my list subscription set up correctly so my reply (TL;DR sample
>>>>>>>> code available at https://github.com/sjvanrossum/beam-wasm) didn't
>>>>>>>> come through until a bit later yesterday I think.
>>>>>>>>
>>>>>>>> Sean, I agree with your suggestion of Arrow as the interchange
>>>>>>>> format for Wasm transforms and it's something I thought about exploring
>>>>>>>> when I was adding serialization/deserialization of complex (meaning
>>>>>>>> anything that's not an integer or float in the context of Wasm) data types
>>>>>>>> in the demo. It's an unfortunate bit of overhead which could very well be
>>>>>>>> solved with Arrow and shared memory between Wasm modules.
>>>>>>>> I've seen Wasm transforms pop up in a few other places, notably in
>>>>>>>> streaming data platforms like Fluvio and Redpanda and they seem to incur
>>>>>>>> the same overhead when moving data into and out of the guest context so
>>>>>>>> maybe it's negligible, but I haven't done any serious benchmark yet to
>>>>>>>> validate that.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>>
>>>>>>>> Steve
>>>>>>>>
>>>>>>>> On Thu, Jun 16, 2022 at 3:04 AM Robert Burke <ro...@frantil.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Obligatory mention that WASM is basically an architecture that any
>>>>>>>>> well meaning compiler can target, eg the Go compiler
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> https://www.bradcypert.com/an-introduction-to-targeting-web-assembly-with-golang/
>>>>>>>>>
>>>>>>>>> (Among many articles for the last few years)
>>>>>>>>>
>>>>>>>>> Robert Burke
>>>>>>>>> Beam Go Busybody
>>>>>>>>>
>>>>>>>>> On Wed, Jun 15, 2022, 2:04 PM Sean Jensen-Grey <
>>>>>>>>> jensengrey@google.com> wrote:
>>>>>>>>>
>>>>>>>>>> Heh, my stage fright was so strong, I didn't realize that the
>>>>>>>>>> talk was recorded. :)
>>>>>>>>>>
>>>>>>>>>> Steven, I'd love to chat about Wasm in Beam. This email is a bit
>>>>>>>>>> rough.
>>>>>>>>>>
>>>>>>>>>> I haven't explored Wasm in Beam much since that talk. I think the
>>>>>>>>>> most compelling use is in the portability of logic between data processing
>>>>>>>>>> systems. Esp in the use of probabilistic data structures like Bloom
>>>>>>>>>> Filters, Count-Min-Sketch, HyperLogLog, where it is nice to persist the
>>>>>>>>>> data structure and use it on a different system. Like generating a bloom
>>>>>>>>>> filter in Beam and using it inside of a BQ query w/o having to reimplement
>>>>>>>>>> and test across many platforms.
>>>>>>>>>>
>>>>>>>>>> I have used Wasm in BQ, as BQ UDFs are driven by V8. Anywhere V8
>>>>>>>>>> exists, Wasm support exists for free unless the embedder goes out of their
>>>>>>>>>> way to disable it. So it is supported in Deno/Node as well. In Python, Wasm
>>>>>>>>>> support via Wasmtime
>>>>>>>>>> <https://github.com/bytecodealliance/wasmtime> is really good.
>>>>>>>>>> There are *many* options for execution environments, one of the downsides
>>>>>>>>>> of passing through JS one is in string and number support(float/int64)
>>>>>>>>>> issues, afaik. I could be wrong, maybe JS has fixed all this by now.
>>>>>>>>>>
>>>>>>>>>> The qualities in order of importance (for me) are
>>>>>>>>>>
>>>>>>>>>>    1. Portability, run the same code everywhere
>>>>>>>>>>    2. Security, memory safety for the caller. Running Wasm
>>>>>>>>>>    inside of Python should never crash your Python interpreter. The capability
>>>>>>>>>>    model ensures that the Wasm module can only do what you allow it to
>>>>>>>>>>    3. Performance (portable), compile once and run everywhere
>>>>>>>>>>    within some margin of native.  Python makes this look good :)
>>>>>>>>>>
>>>>>>>>>> I think something worth exploring is moving opaque-ish Arrow
>>>>>>>>>> objects around via Beam, so that Beam is now mostly in the control plane
>>>>>>>>>> and computation happens in Wasm, this should reduce the serialization
>>>>>>>>>> overhead and also get Python out of the datapath.
>>>>>>>>>>
>>>>>>>>>> I see someone exploring Wasm+Arrow here,
>>>>>>>>>> https://github.com/domoritz/arrow-wasm
>>>>>>>>>>
>>>>>>>>>> Another possibly interesting avenue to explore is compiling
>>>>>>>>>> command line programs to Wasi (WebAssembly System Interface), the POSIX
>>>>>>>>>> like shim, so that they can be run inprocess without the fork/exec/pipe
>>>>>>>>>> overhead of running a subprocess. A neat demo might be running something
>>>>>>>>>> like Jq <https://stedolan.github.io/jq/> inside of a Beam job.
>>>>>>>>>>
>>>>>>>>>> Not to make Wasm sound like a Python only technology, it can be
>>>>>>>>>> used via Java/JVM via
>>>>>>>>>>
>>>>>>>>>>    - https://www.graalvm.org/22.1/reference-manual/wasm/
>>>>>>>>>>    - https://github.com/kawamuray/wasmtime-java
>>>>>>>>>>
>>>>>>>>>> Sean
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Wed, Jun 15, 2022 at 9:35 AM Pablo Estrada <pa...@google.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> adding Steven in case he didn't get the replies : )
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Jun 15, 2022 at 9:29 AM Daniel Collins <
>>>>>>>>>>> dpcollins@google.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> If we ever do anything with the JS runtime, this would seem to
>>>>>>>>>>>> be the best place to run WASM.
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Jun 14, 2022 at 8:13 PM Brian Hulette <
>>>>>>>>>>>> bhulette@google.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> FYI: @Sean Jensen-Grey <je...@google.com> gave a talk
>>>>>>>>>>>>> back in 2020 where he had integrated Rust with the Python SDK. I thought he
>>>>>>>>>>>>> used WebAssembly for that, but it looks like he used some other approaches,
>>>>>>>>>>>>> and his talk mentioned WebAssembly as future work. Not sure if that was
>>>>>>>>>>>>> ever explored.
>>>>>>>>>>>>>
>>>>>>>>>>>>> https://www.youtube.com/watch?v=fZK_Tiu7q1o
>>>>>>>>>>>>> https://github.com/seanjensengrey/beam-rust-python-java
>>>>>>>>>>>>>
>>>>>>>>>>>>> Brian
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Jun 14, 2022 at 5:05 PM Ahmet Altay <al...@google.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Adding @Lukasz Cwik <lc...@google.com> - he was interested
>>>>>>>>>>>>>> in the WebAssembly topic.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, Jun 14, 2022 at 3:09 PM Pablo Estrada <
>>>>>>>>>>>>>> pabloem@google.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Would you open a pull request for it? Or at least share a
>>>>>>>>>>>>>>> branch? : )
>>>>>>>>>>>>>>> Even if we don't want to merge it, it would be great to have
>>>>>>>>>>>>>>> a PR as a way to showcase the work, its usefulness, and receive comments on
>>>>>>>>>>>>>>> this thread once we can see something more specific.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Tue, Jun 14, 2022 at 3:05 PM Steven van Rossum <
>>>>>>>>>>>>>>> sjvanrossum@google.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi folks,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I had some spare time yesterday and thought it'd be fun to
>>>>>>>>>>>>>>>> implement a transform which runs WebAssembly modules as a lightweight way
>>>>>>>>>>>>>>>> to implement cross language transforms for languages which don't (yet) have
>>>>>>>>>>>>>>>> a SDK implementation.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I've got a small proof of concept running in the Python SDK
>>>>>>>>>>>>>>>> as a DoFn with Wasmer as the WebAssembly runtime and simple support for
>>>>>>>>>>>>>>>> marshalling between the host and guest environment with the RowCoder. The
>>>>>>>>>>>>>>>> module I've constructed is mostly useless, but demonstrates the host
>>>>>>>>>>>>>>>> copying the encoded element into the guest's memory, the guest copying
>>>>>>>>>>>>>>>> those bytes elsewhere in its linear memory buffer, the guest calling back
>>>>>>>>>>>>>>>> to the host with the offset and size and the host copying and decoding from
>>>>>>>>>>>>>>>> the guest's memory.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Any thoughts/interest? I'm not sure where I was going with
>>>>>>>>>>>>>>>> this, since it was mostly just a "wouldn't it be cool if..." on a Monday
>>>>>>>>>>>>>>>> afternoon, but I can see a few use cases for this.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Steve
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Steven van Rossum |  Strategic Cloud Engineer |
>>>>>>>>>>>>>>>> sjvanrossum@google.com |  (+31) (0)6 21174069
>>>>>>>>>>>>>>>> <+31%206%2021174069>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> *Google Netherlands B.V.*
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> *Reg: Claude Debussylaan 34 15th floor, 1082 MD
>>>>>>>>>>>>>>>> Amsterdam34198589NETHERLANDSVAT / Tax ID:- 812788515 B01*
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> *If you received this communication by mistake, please
>>>>>>>>>>>>>>>> don't forward it to anyone else (it may contain confidential or privileged
>>>>>>>>>>>>>>>> information), please erase all copies of it, including all attachments, and
>>>>>>>>>>>>>>>> please let the sender know it went to the wrong person. Thanks.*
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> *The above terms reflect a potential business arrangement,
>>>>>>>>>>>>>>>> are provided solely as a basis for further discussion, and are not intended
>>>>>>>>>>>>>>>> to be and do not constitute a legally binding obligation. No legally
>>>>>>>>>>>>>>>> binding obligations will be created, implied, or inferred until an
>>>>>>>>>>>>>>>> agreement in final form is executed in writing by all parties involved.*
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>

Re: Fun with WebAssembly transforms

Posted by Luke Cwik via dev <de...@beam.apache.org>.
I think an easier target would be to support things like
DynamicDestinations for Java IO connectors that are exposed as XLang for
Go/Python.

This is because Go/Python have good transpiling support to WebAssembly and
we already exposed several Java IO XLang connectors already so its about
plumbing one more thing through for these IO connectors.

What interface should we expect for UDFs / UDAFs and should they be purpose
oriented or should we do something like we did for portability where we
have a graph of transforms that we feed arbitrary data in/out from. The
latter would have the benefit of allowing the runner to embed the language
execution directly within the runner and would pay the Wasm communication
tax instead of the gRPC communication tax. If we do the former we still
have the same issues where we have to be able to have a type system to pass
information between the host system and the transpiled WebAssembly code
that wraps the users UDF/UDAF and what if the UDF wants access to side
inputs or user state ...

On Wed, Jul 13, 2022 at 4:09 PM Chamikara Jayalath <ch...@google.com>
wrote:

>
>
> On Wed, Jul 13, 2022 at 9:31 AM Luke Cwik <lc...@google.com> wrote:
>
>> First we'll want to choose whether we want to target Wasm, WASI or Wagi.
>>
>
> These terms are defined here
> <https://www.fermyon.com/blog/wasm-wasi-wagi?gclid=CjwKCAjw2rmWBhB4EiwAiJ0mtVhiTuMZmy4bJSlk4nJj1deNX3KueomLgkG8JMyGeiHJ3FJRPpVn7BoCs58QAvD_BwE>
> if anybody is confused as I am :)
>
>
>> WASI adds a lot of simple things like access to a clock, random number
>> generator, ... that would expand the scope of what transpiled code can do.
>> It is debatable whether we'll want the power to run the transpiled code as
>> a microservice. Using UDFs for XLang and UDFs and UDAFs for SQL as our
>> expected use cases seem to make WASI the best choice. The issue is in the
>> details as there is a hodgepodge of what language runtimes support and what
>> are the limits of transpiling from a language to WebAssembly.
>>
>
> Agree that WASI seems like a good target since it gives access to
> additional system resources/tooling.
>
>
>>
>> Assuming WASI then it breaks down to these two aspects:
>> 1) Does the host language have a runtime?
>> Java: https://github.com/wasmerio/wasmer-java
>> Python: https://github.com/wasmerio/wasmer-python
>> Go: https://github.com/wasmerio/wasmer-go
>>
>> 2) How good is compilation from source language to WebAssembly
>> <https://github.com/appcypher/awesome-wasm-langs>?
>> Java (very limited):
>> Issues with garbage collection and the need to transpile/replace much of
>> the VM's capabilities plus the large standard library that everyone uses
>> causes a lot of challenges.
>> JWebAssembly can do simple things like basic classes, strings, method
>> calls. Should be able to compile trivial lambdas to Wasm. There are other
>> choices but to my knowledge all are very limited.
>>
>
> That's unfortunate. But hopefully Java support will be implemented soon ?
>
>
>>
>> Python <https://pythondev.readthedocs.io/wasm.html> (quite good):
>> Features CPython Emscripten browser CPython Emscripten node Pyodide
>> subprocess (fork, exec) no no no
>> threads no YES WIP
>> file system no (only MEMFS) YES (Node raw FS) YES (IDB, Node, …)
>> shared extension modules WIP WIP YES
>> PyPI packages no no YES
>> sockets ? ? ?
>> urllib, asyncio no no WebAPI fetch / WebSocket
>> signals no WIP YES
>>
>> Go (excellent): Native support in go compiler
>>
>
> Great. Could executing Go UDFs in Python x-lang transforms (for example,
> Dataframe, RunInference, Python Map) be a good first target ?
>
> Thanks,
> Cham
>
>
>>
>> On Tue, Jul 12, 2022 at 5:51 PM Chamikara Jayalath via dev <
>> dev@beam.apache.org> wrote:
>>
>>>
>>>
>>> On Wed, Jun 29, 2022 at 9:31 AM Luke Cwik <lc...@google.com> wrote:
>>>
>>>> I have had interest in integrating Wasm within Beam as well as I have
>>>> had a lot of interest in improving language portability.
>>>>
>>>> Wasm has a lot of benefits over using docker containers to provide a
>>>> place for code to execute. From experience implementing working on the
>>>> Beam's portability layer and internal Flume knowledge:
>>>> * encoding and decoding data is expensive, anything which ensures that
>>>> in-memory representations for data being transferred from the host to the
>>>> guest and back without transcoding/re-interpreting will be a big win.
>>>> * reducing the amount of times we need to pass data between guest and
>>>> host and back is important
>>>>   * fusing transforms reduces the number of data passing points
>>>>   * batching (row or columnar) data reduces the amount of times we need
>>>> to pass data at each data passing point
>>>> * there are enough complicated use cases (state & timers, large
>>>> iterables, side inputs) where handling the trivial map/flatmap usecase will
>>>> provide little value since it will prevent fusion
>>>>
>>>> I have been meaning to work on a prototype where we replace the current
>>>> gRPC + docker path with one in which we use Wasm to execute a fused graph
>>>> re-using large parts of the existing code base written to support
>>>> portability.
>>>>
>>>
>>> This sounds very interesting. Probably using Wasm to implement proper
>>> UDF support for x-lang (for example, executing Python timestamp/watermark
>>> functions provided through the Kafka Python x-lang wrapper on the Java
>>> Kafka transform) will be a good first target ? My main question for this at
>>> this point is whether Wasm has adequate support for existing SDKs that use
>>> x-lang to implement this in a useful way.
>>>
>>> Thanks,
>>> Cham
>>>
>>>
>>>>
>>>>
>>>> On Fri, Jun 17, 2022 at 2:19 PM Brian Hulette <bh...@google.com>
>>>> wrote:
>>>>
>>>>> Re: Arrow - it's long been my dream to use Arrow for interchange in
>>>>> Beam [1]. I'm trying to move us in that direction with
>>>>> https://s.apache.org/batched-dofns (arrow is discussed briefly in the
>>>>> Future Work section). This gives the Python SDK a concept of batches of
>>>>> logical elements. My goal is Beam schemas + batches of logical elements ->
>>>>> Arrow RecordBatches.
>>>>>
>>>>> The Batched DoFn infrastructure is stable as of the 2.40.0 release cut
>>>>> and I'm currently working on adding what I'm calling a "BatchConverter" [2]
>>>>> for Beam Rows -> Arrow RecordBatch. Once that's done it could be
>>>>> interesting to experiment with a "WasmDoFn" that uses Arrow for interchange.
>>>>>
>>>>> Brian
>>>>>
>>>>> [1]
>>>>> https://docs.google.com/presentation/d/1D9vigwYTCuAuz_CO8nex3GK3h873acmQJE5Ui8TFsDY/edit#slide=id.g608e662464_0_160
>>>>> [2]
>>>>> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/typehints/batch.py
>>>>>
>>>>>
>>>>> On Thu, Jun 16, 2022 at 10:55 AM Sean Jensen-Grey <
>>>>> jensengrey@google.com> wrote:
>>>>>
>>>>>> Interesting.
>>>>>>
>>>>>> Robert, I was just served an ad for Redpanda when I searched for
>>>>>> "golang wasm" :)
>>>>>>
>>>>>> The storage and execution grid systems are all embracing wasm in some
>>>>>> way.
>>>>>>
>>>>>> https://redpanda.com/
>>>>>> https://www.fluvio.io/
>>>>>> https://temporal.io/ (Cadence fork by the Cadence folks, I met Maxim
>>>>>> the lead at Temporal at the 2020 Wasm Summit)
>>>>>> https://github.com/pachyderm/pachyderm no mention of wasm, yet.
>>>>>>
>>>>>> Keep the Wasm+Beam demos coming.
>>>>>>
>>>>>> Sean
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Jun 16, 2022 at 4:23 AM Steven van Rossum <
>>>>>> sjvanrossum@google.com> wrote:
>>>>>>
>>>>>>> I caught up with all the replies through the web interface, but I
>>>>>>> didn't have my list subscription set up correctly so my reply (TL;DR sample
>>>>>>> code available at https://github.com/sjvanrossum/beam-wasm) didn't
>>>>>>> come through until a bit later yesterday I think.
>>>>>>>
>>>>>>> Sean, I agree with your suggestion of Arrow as the interchange
>>>>>>> format for Wasm transforms and it's something I thought about exploring
>>>>>>> when I was adding serialization/deserialization of complex (meaning
>>>>>>> anything that's not an integer or float in the context of Wasm) data types
>>>>>>> in the demo. It's an unfortunate bit of overhead which could very well be
>>>>>>> solved with Arrow and shared memory between Wasm modules.
>>>>>>> I've seen Wasm transforms pop up in a few other places, notably in
>>>>>>> streaming data platforms like Fluvio and Redpanda and they seem to incur
>>>>>>> the same overhead when moving data into and out of the guest context so
>>>>>>> maybe it's negligible, but I haven't done any serious benchmark yet to
>>>>>>> validate that.
>>>>>>>
>>>>>>> Regards,
>>>>>>>
>>>>>>> Steve
>>>>>>>
>>>>>>> On Thu, Jun 16, 2022 at 3:04 AM Robert Burke <ro...@frantil.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Obligatory mention that WASM is basically an architecture that any
>>>>>>>> well meaning compiler can target, eg the Go compiler
>>>>>>>>
>>>>>>>>
>>>>>>>> https://www.bradcypert.com/an-introduction-to-targeting-web-assembly-with-golang/
>>>>>>>>
>>>>>>>> (Among many articles for the last few years)
>>>>>>>>
>>>>>>>> Robert Burke
>>>>>>>> Beam Go Busybody
>>>>>>>>
>>>>>>>> On Wed, Jun 15, 2022, 2:04 PM Sean Jensen-Grey <
>>>>>>>> jensengrey@google.com> wrote:
>>>>>>>>
>>>>>>>>> Heh, my stage fright was so strong, I didn't realize that the talk
>>>>>>>>> was recorded. :)
>>>>>>>>>
>>>>>>>>> Steven, I'd love to chat about Wasm in Beam. This email is a bit
>>>>>>>>> rough.
>>>>>>>>>
>>>>>>>>> I haven't explored Wasm in Beam much since that talk. I think the
>>>>>>>>> most compelling use is in the portability of logic between data processing
>>>>>>>>> systems. Esp in the use of probabilistic data structures like Bloom
>>>>>>>>> Filters, Count-Min-Sketch, HyperLogLog, where it is nice to persist the
>>>>>>>>> data structure and use it on a different system. Like generating a bloom
>>>>>>>>> filter in Beam and using it inside of a BQ query w/o having to reimplement
>>>>>>>>> and test across many platforms.
>>>>>>>>>
>>>>>>>>> I have used Wasm in BQ, as BQ UDFs are driven by V8. Anywhere V8
>>>>>>>>> exists, Wasm support exists for free unless the embedder goes out of their
>>>>>>>>> way to disable it. So it is supported in Deno/Node as well. In Python, Wasm
>>>>>>>>> support via Wasmtime
>>>>>>>>> <https://github.com/bytecodealliance/wasmtime> is really good.
>>>>>>>>> There are *many* options for execution environments, one of the downsides
>>>>>>>>> of passing through JS one is in string and number support(float/int64)
>>>>>>>>> issues, afaik. I could be wrong, maybe JS has fixed all this by now.
>>>>>>>>>
>>>>>>>>> The qualities in order of importance (for me) are
>>>>>>>>>
>>>>>>>>>    1. Portability, run the same code everywhere
>>>>>>>>>    2. Security, memory safety for the caller. Running Wasm inside
>>>>>>>>>    of Python should never crash your Python interpreter. The capability model
>>>>>>>>>    ensures that the Wasm module can only do what you allow it to
>>>>>>>>>    3. Performance (portable), compile once and run everywhere
>>>>>>>>>    within some margin of native.  Python makes this look good :)
>>>>>>>>>
>>>>>>>>> I think something worth exploring is moving opaque-ish Arrow
>>>>>>>>> objects around via Beam, so that Beam is now mostly in the control plane
>>>>>>>>> and computation happens in Wasm, this should reduce the serialization
>>>>>>>>> overhead and also get Python out of the datapath.
>>>>>>>>>
>>>>>>>>> I see someone exploring Wasm+Arrow here,
>>>>>>>>> https://github.com/domoritz/arrow-wasm
>>>>>>>>>
>>>>>>>>> Another possibly interesting avenue to explore is compiling
>>>>>>>>> command line programs to Wasi (WebAssembly System Interface), the POSIX
>>>>>>>>> like shim, so that they can be run inprocess without the fork/exec/pipe
>>>>>>>>> overhead of running a subprocess. A neat demo might be running something
>>>>>>>>> like Jq <https://stedolan.github.io/jq/> inside of a Beam job.
>>>>>>>>>
>>>>>>>>> Not to make Wasm sound like a Python only technology, it can be
>>>>>>>>> used via Java/JVM via
>>>>>>>>>
>>>>>>>>>    - https://www.graalvm.org/22.1/reference-manual/wasm/
>>>>>>>>>    - https://github.com/kawamuray/wasmtime-java
>>>>>>>>>
>>>>>>>>> Sean
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Jun 15, 2022 at 9:35 AM Pablo Estrada <pa...@google.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> adding Steven in case he didn't get the replies : )
>>>>>>>>>>
>>>>>>>>>> On Wed, Jun 15, 2022 at 9:29 AM Daniel Collins <
>>>>>>>>>> dpcollins@google.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> If we ever do anything with the JS runtime, this would seem to
>>>>>>>>>>> be the best place to run WASM.
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Jun 14, 2022 at 8:13 PM Brian Hulette <
>>>>>>>>>>> bhulette@google.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> FYI: @Sean Jensen-Grey <je...@google.com> gave a talk
>>>>>>>>>>>> back in 2020 where he had integrated Rust with the Python SDK. I thought he
>>>>>>>>>>>> used WebAssembly for that, but it looks like he used some other approaches,
>>>>>>>>>>>> and his talk mentioned WebAssembly as future work. Not sure if that was
>>>>>>>>>>>> ever explored.
>>>>>>>>>>>>
>>>>>>>>>>>> https://www.youtube.com/watch?v=fZK_Tiu7q1o
>>>>>>>>>>>> https://github.com/seanjensengrey/beam-rust-python-java
>>>>>>>>>>>>
>>>>>>>>>>>> Brian
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Jun 14, 2022 at 5:05 PM Ahmet Altay <al...@google.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Adding @Lukasz Cwik <lc...@google.com> - he was interested in
>>>>>>>>>>>>> the WebAssembly topic.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Jun 14, 2022 at 3:09 PM Pablo Estrada <
>>>>>>>>>>>>> pabloem@google.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Would you open a pull request for it? Or at least share a
>>>>>>>>>>>>>> branch? : )
>>>>>>>>>>>>>> Even if we don't want to merge it, it would be great to have
>>>>>>>>>>>>>> a PR as a way to showcase the work, its usefulness, and receive comments on
>>>>>>>>>>>>>> this thread once we can see something more specific.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, Jun 14, 2022 at 3:05 PM Steven van Rossum <
>>>>>>>>>>>>>> sjvanrossum@google.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi folks,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I had some spare time yesterday and thought it'd be fun to
>>>>>>>>>>>>>>> implement a transform which runs WebAssembly modules as a lightweight way
>>>>>>>>>>>>>>> to implement cross language transforms for languages which don't (yet) have
>>>>>>>>>>>>>>> a SDK implementation.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I've got a small proof of concept running in the Python SDK
>>>>>>>>>>>>>>> as a DoFn with Wasmer as the WebAssembly runtime and simple support for
>>>>>>>>>>>>>>> marshalling between the host and guest environment with the RowCoder. The
>>>>>>>>>>>>>>> module I've constructed is mostly useless, but demonstrates the host
>>>>>>>>>>>>>>> copying the encoded element into the guest's memory, the guest copying
>>>>>>>>>>>>>>> those bytes elsewhere in its linear memory buffer, the guest calling back
>>>>>>>>>>>>>>> to the host with the offset and size and the host copying and decoding from
>>>>>>>>>>>>>>> the guest's memory.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Any thoughts/interest? I'm not sure where I was going with
>>>>>>>>>>>>>>> this, since it was mostly just a "wouldn't it be cool if..." on a Monday
>>>>>>>>>>>>>>> afternoon, but I can see a few use cases for this.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Steve
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Steven van Rossum |  Strategic Cloud Engineer |
>>>>>>>>>>>>>>> sjvanrossum@google.com |  (+31) (0)6 21174069
>>>>>>>>>>>>>>> <+31%206%2021174069>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> *Google Netherlands B.V.*
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> *Reg: Claude Debussylaan 34 15th floor, 1082 MD
>>>>>>>>>>>>>>> Amsterdam34198589NETHERLANDSVAT / Tax ID:- 812788515 B01*
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> *If you received this communication by mistake, please don't
>>>>>>>>>>>>>>> forward it to anyone else (it may contain confidential or privileged
>>>>>>>>>>>>>>> information), please erase all copies of it, including all attachments, and
>>>>>>>>>>>>>>> please let the sender know it went to the wrong person. Thanks.*
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> *The above terms reflect a potential business arrangement,
>>>>>>>>>>>>>>> are provided solely as a basis for further discussion, and are not intended
>>>>>>>>>>>>>>> to be and do not constitute a legally binding obligation. No legally
>>>>>>>>>>>>>>> binding obligations will be created, implied, or inferred until an
>>>>>>>>>>>>>>> agreement in final form is executed in writing by all parties involved.*
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>

Re: Fun with WebAssembly transforms

Posted by Chamikara Jayalath via dev <de...@beam.apache.org>.
On Wed, Jul 13, 2022 at 9:31 AM Luke Cwik <lc...@google.com> wrote:

> First we'll want to choose whether we want to target Wasm, WASI or Wagi.
>

These terms are defined here
<https://www.fermyon.com/blog/wasm-wasi-wagi?gclid=CjwKCAjw2rmWBhB4EiwAiJ0mtVhiTuMZmy4bJSlk4nJj1deNX3KueomLgkG8JMyGeiHJ3FJRPpVn7BoCs58QAvD_BwE>
if anybody is confused as I am :)


> WASI adds a lot of simple things like access to a clock, random number
> generator, ... that would expand the scope of what transpiled code can do.
> It is debatable whether we'll want the power to run the transpiled code as
> a microservice. Using UDFs for XLang and UDFs and UDAFs for SQL as our
> expected use cases seem to make WASI the best choice. The issue is in the
> details as there is a hodgepodge of what language runtimes support and what
> are the limits of transpiling from a language to WebAssembly.
>

Agree that WASI seems like a good target since it gives access to
additional system resources/tooling.


>
> Assuming WASI then it breaks down to these two aspects:
> 1) Does the host language have a runtime?
> Java: https://github.com/wasmerio/wasmer-java
> Python: https://github.com/wasmerio/wasmer-python
> Go: https://github.com/wasmerio/wasmer-go
>
> 2) How good is compilation from source language to WebAssembly
> <https://github.com/appcypher/awesome-wasm-langs>?
> Java (very limited):
> Issues with garbage collection and the need to transpile/replace much of
> the VM's capabilities plus the large standard library that everyone uses
> causes a lot of challenges.
> JWebAssembly can do simple things like basic classes, strings, method
> calls. Should be able to compile trivial lambdas to Wasm. There are other
> choices but to my knowledge all are very limited.
>

That's unfortunate. But hopefully Java support will be implemented soon ?


>
> Python <https://pythondev.readthedocs.io/wasm.html> (quite good):
> Features CPython Emscripten browser CPython Emscripten node Pyodide
> subprocess (fork, exec) no no no
> threads no YES WIP
> file system no (only MEMFS) YES (Node raw FS) YES (IDB, Node, …)
> shared extension modules WIP WIP YES
> PyPI packages no no YES
> sockets ? ? ?
> urllib, asyncio no no WebAPI fetch / WebSocket
> signals no WIP YES
>
> Go (excellent): Native support in go compiler
>

Great. Could executing Go UDFs in Python x-lang transforms (for example,
Dataframe, RunInference, Python Map) be a good first target ?

Thanks,
Cham


>
> On Tue, Jul 12, 2022 at 5:51 PM Chamikara Jayalath via dev <
> dev@beam.apache.org> wrote:
>
>>
>>
>> On Wed, Jun 29, 2022 at 9:31 AM Luke Cwik <lc...@google.com> wrote:
>>
>>> I have had interest in integrating Wasm within Beam as well as I have
>>> had a lot of interest in improving language portability.
>>>
>>> Wasm has a lot of benefits over using docker containers to provide a
>>> place for code to execute. From experience implementing working on the
>>> Beam's portability layer and internal Flume knowledge:
>>> * encoding and decoding data is expensive, anything which ensures that
>>> in-memory representations for data being transferred from the host to the
>>> guest and back without transcoding/re-interpreting will be a big win.
>>> * reducing the amount of times we need to pass data between guest and
>>> host and back is important
>>>   * fusing transforms reduces the number of data passing points
>>>   * batching (row or columnar) data reduces the amount of times we need
>>> to pass data at each data passing point
>>> * there are enough complicated use cases (state & timers, large
>>> iterables, side inputs) where handling the trivial map/flatmap usecase will
>>> provide little value since it will prevent fusion
>>>
>>> I have been meaning to work on a prototype where we replace the current
>>> gRPC + docker path with one in which we use Wasm to execute a fused graph
>>> re-using large parts of the existing code base written to support
>>> portability.
>>>
>>
>> This sounds very interesting. Probably using Wasm to implement proper UDF
>> support for x-lang (for example, executing Python timestamp/watermark
>> functions provided through the Kafka Python x-lang wrapper on the Java
>> Kafka transform) will be a good first target ? My main question for this at
>> this point is whether Wasm has adequate support for existing SDKs that use
>> x-lang to implement this in a useful way.
>>
>> Thanks,
>> Cham
>>
>>
>>>
>>>
>>> On Fri, Jun 17, 2022 at 2:19 PM Brian Hulette <bh...@google.com>
>>> wrote:
>>>
>>>> Re: Arrow - it's long been my dream to use Arrow for interchange in
>>>> Beam [1]. I'm trying to move us in that direction with
>>>> https://s.apache.org/batched-dofns (arrow is discussed briefly in the
>>>> Future Work section). This gives the Python SDK a concept of batches of
>>>> logical elements. My goal is Beam schemas + batches of logical elements ->
>>>> Arrow RecordBatches.
>>>>
>>>> The Batched DoFn infrastructure is stable as of the 2.40.0 release cut
>>>> and I'm currently working on adding what I'm calling a "BatchConverter" [2]
>>>> for Beam Rows -> Arrow RecordBatch. Once that's done it could be
>>>> interesting to experiment with a "WasmDoFn" that uses Arrow for interchange.
>>>>
>>>> Brian
>>>>
>>>> [1]
>>>> https://docs.google.com/presentation/d/1D9vigwYTCuAuz_CO8nex3GK3h873acmQJE5Ui8TFsDY/edit#slide=id.g608e662464_0_160
>>>> [2]
>>>> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/typehints/batch.py
>>>>
>>>>
>>>> On Thu, Jun 16, 2022 at 10:55 AM Sean Jensen-Grey <
>>>> jensengrey@google.com> wrote:
>>>>
>>>>> Interesting.
>>>>>
>>>>> Robert, I was just served an ad for Redpanda when I searched for
>>>>> "golang wasm" :)
>>>>>
>>>>> The storage and execution grid systems are all embracing wasm in some
>>>>> way.
>>>>>
>>>>> https://redpanda.com/
>>>>> https://www.fluvio.io/
>>>>> https://temporal.io/ (Cadence fork by the Cadence folks, I met Maxim
>>>>> the lead at Temporal at the 2020 Wasm Summit)
>>>>> https://github.com/pachyderm/pachyderm no mention of wasm, yet.
>>>>>
>>>>> Keep the Wasm+Beam demos coming.
>>>>>
>>>>> Sean
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Jun 16, 2022 at 4:23 AM Steven van Rossum <
>>>>> sjvanrossum@google.com> wrote:
>>>>>
>>>>>> I caught up with all the replies through the web interface, but I
>>>>>> didn't have my list subscription set up correctly so my reply (TL;DR sample
>>>>>> code available at https://github.com/sjvanrossum/beam-wasm) didn't
>>>>>> come through until a bit later yesterday I think.
>>>>>>
>>>>>> Sean, I agree with your suggestion of Arrow as the interchange format
>>>>>> for Wasm transforms and it's something I thought about exploring when I was
>>>>>> adding serialization/deserialization of complex (meaning anything that's
>>>>>> not an integer or float in the context of Wasm) data types in the demo.
>>>>>> It's an unfortunate bit of overhead which could very well be solved with
>>>>>> Arrow and shared memory between Wasm modules.
>>>>>> I've seen Wasm transforms pop up in a few other places, notably in
>>>>>> streaming data platforms like Fluvio and Redpanda and they seem to incur
>>>>>> the same overhead when moving data into and out of the guest context so
>>>>>> maybe it's negligible, but I haven't done any serious benchmark yet to
>>>>>> validate that.
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> Steve
>>>>>>
>>>>>> On Thu, Jun 16, 2022 at 3:04 AM Robert Burke <ro...@frantil.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Obligatory mention that WASM is basically an architecture that any
>>>>>>> well meaning compiler can target, eg the Go compiler
>>>>>>>
>>>>>>>
>>>>>>> https://www.bradcypert.com/an-introduction-to-targeting-web-assembly-with-golang/
>>>>>>>
>>>>>>> (Among many articles for the last few years)
>>>>>>>
>>>>>>> Robert Burke
>>>>>>> Beam Go Busybody
>>>>>>>
>>>>>>> On Wed, Jun 15, 2022, 2:04 PM Sean Jensen-Grey <
>>>>>>> jensengrey@google.com> wrote:
>>>>>>>
>>>>>>>> Heh, my stage fright was so strong, I didn't realize that the talk
>>>>>>>> was recorded. :)
>>>>>>>>
>>>>>>>> Steven, I'd love to chat about Wasm in Beam. This email is a bit
>>>>>>>> rough.
>>>>>>>>
>>>>>>>> I haven't explored Wasm in Beam much since that talk. I think the
>>>>>>>> most compelling use is in the portability of logic between data processing
>>>>>>>> systems. Esp in the use of probabilistic data structures like Bloom
>>>>>>>> Filters, Count-Min-Sketch, HyperLogLog, where it is nice to persist the
>>>>>>>> data structure and use it on a different system. Like generating a bloom
>>>>>>>> filter in Beam and using it inside of a BQ query w/o having to reimplement
>>>>>>>> and test across many platforms.
>>>>>>>>
>>>>>>>> I have used Wasm in BQ, as BQ UDFs are driven by V8. Anywhere V8
>>>>>>>> exists, Wasm support exists for free unless the embedder goes out of their
>>>>>>>> way to disable it. So it is supported in Deno/Node as well. In Python, Wasm
>>>>>>>> support via Wasmtime <https://github.com/bytecodealliance/wasmtime>
>>>>>>>> is really good.  There are *many* options for execution environments, one
>>>>>>>> of the downsides of passing through JS one is in string and number
>>>>>>>> support(float/int64) issues, afaik. I could be wrong, maybe JS has fixed
>>>>>>>> all this by now.
>>>>>>>>
>>>>>>>> The qualities in order of importance (for me) are
>>>>>>>>
>>>>>>>>    1. Portability, run the same code everywhere
>>>>>>>>    2. Security, memory safety for the caller. Running Wasm inside
>>>>>>>>    of Python should never crash your Python interpreter. The capability model
>>>>>>>>    ensures that the Wasm module can only do what you allow it to
>>>>>>>>    3. Performance (portable), compile once and run everywhere
>>>>>>>>    within some margin of native.  Python makes this look good :)
>>>>>>>>
>>>>>>>> I think something worth exploring is moving opaque-ish Arrow
>>>>>>>> objects around via Beam, so that Beam is now mostly in the control plane
>>>>>>>> and computation happens in Wasm, this should reduce the serialization
>>>>>>>> overhead and also get Python out of the datapath.
>>>>>>>>
>>>>>>>> I see someone exploring Wasm+Arrow here,
>>>>>>>> https://github.com/domoritz/arrow-wasm
>>>>>>>>
>>>>>>>> Another possibly interesting avenue to explore is compiling command
>>>>>>>> line programs to Wasi (WebAssembly System Interface), the POSIX like shim,
>>>>>>>> so that they can be run inprocess without the fork/exec/pipe overhead of
>>>>>>>> running a subprocess. A neat demo might be running something like
>>>>>>>> Jq <https://stedolan.github.io/jq/> inside of a Beam job.
>>>>>>>>
>>>>>>>> Not to make Wasm sound like a Python only technology, it can be
>>>>>>>> used via Java/JVM via
>>>>>>>>
>>>>>>>>    - https://www.graalvm.org/22.1/reference-manual/wasm/
>>>>>>>>    - https://github.com/kawamuray/wasmtime-java
>>>>>>>>
>>>>>>>> Sean
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Jun 15, 2022 at 9:35 AM Pablo Estrada <pa...@google.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> adding Steven in case he didn't get the replies : )
>>>>>>>>>
>>>>>>>>> On Wed, Jun 15, 2022 at 9:29 AM Daniel Collins <
>>>>>>>>> dpcollins@google.com> wrote:
>>>>>>>>>
>>>>>>>>>> If we ever do anything with the JS runtime, this would seem to be
>>>>>>>>>> the best place to run WASM.
>>>>>>>>>>
>>>>>>>>>> On Tue, Jun 14, 2022 at 8:13 PM Brian Hulette <
>>>>>>>>>> bhulette@google.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> FYI: @Sean Jensen-Grey <je...@google.com> gave a talk back
>>>>>>>>>>> in 2020 where he had integrated Rust with the Python SDK. I thought he used
>>>>>>>>>>> WebAssembly for that, but it looks like he used some other approaches, and
>>>>>>>>>>> his talk mentioned WebAssembly as future work. Not sure if that was ever
>>>>>>>>>>> explored.
>>>>>>>>>>>
>>>>>>>>>>> https://www.youtube.com/watch?v=fZK_Tiu7q1o
>>>>>>>>>>> https://github.com/seanjensengrey/beam-rust-python-java
>>>>>>>>>>>
>>>>>>>>>>> Brian
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Jun 14, 2022 at 5:05 PM Ahmet Altay <al...@google.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Adding @Lukasz Cwik <lc...@google.com> - he was interested in
>>>>>>>>>>>> the WebAssembly topic.
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Jun 14, 2022 at 3:09 PM Pablo Estrada <
>>>>>>>>>>>> pabloem@google.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Would you open a pull request for it? Or at least share a
>>>>>>>>>>>>> branch? : )
>>>>>>>>>>>>> Even if we don't want to merge it, it would be great to have a
>>>>>>>>>>>>> PR as a way to showcase the work, its usefulness, and receive comments on
>>>>>>>>>>>>> this thread once we can see something more specific.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Jun 14, 2022 at 3:05 PM Steven van Rossum <
>>>>>>>>>>>>> sjvanrossum@google.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi folks,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I had some spare time yesterday and thought it'd be fun to
>>>>>>>>>>>>>> implement a transform which runs WebAssembly modules as a lightweight way
>>>>>>>>>>>>>> to implement cross language transforms for languages which don't (yet) have
>>>>>>>>>>>>>> a SDK implementation.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I've got a small proof of concept running in the Python SDK
>>>>>>>>>>>>>> as a DoFn with Wasmer as the WebAssembly runtime and simple support for
>>>>>>>>>>>>>> marshalling between the host and guest environment with the RowCoder. The
>>>>>>>>>>>>>> module I've constructed is mostly useless, but demonstrates the host
>>>>>>>>>>>>>> copying the encoded element into the guest's memory, the guest copying
>>>>>>>>>>>>>> those bytes elsewhere in its linear memory buffer, the guest calling back
>>>>>>>>>>>>>> to the host with the offset and size and the host copying and decoding from
>>>>>>>>>>>>>> the guest's memory.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Any thoughts/interest? I'm not sure where I was going with
>>>>>>>>>>>>>> this, since it was mostly just a "wouldn't it be cool if..." on a Monday
>>>>>>>>>>>>>> afternoon, but I can see a few use cases for this.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Steve
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Steven van Rossum |  Strategic Cloud Engineer |
>>>>>>>>>>>>>> sjvanrossum@google.com |  (+31) (0)6 21174069
>>>>>>>>>>>>>> <+31%206%2021174069>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> *Google Netherlands B.V.*
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> *Reg: Claude Debussylaan 34 15th floor, 1082 MD
>>>>>>>>>>>>>> Amsterdam34198589NETHERLANDSVAT / Tax ID:- 812788515 B01*
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> *If you received this communication by mistake, please don't
>>>>>>>>>>>>>> forward it to anyone else (it may contain confidential or privileged
>>>>>>>>>>>>>> information), please erase all copies of it, including all attachments, and
>>>>>>>>>>>>>> please let the sender know it went to the wrong person. Thanks.*
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> *The above terms reflect a potential business arrangement,
>>>>>>>>>>>>>> are provided solely as a basis for further discussion, and are not intended
>>>>>>>>>>>>>> to be and do not constitute a legally binding obligation. No legally
>>>>>>>>>>>>>> binding obligations will be created, implied, or inferred until an
>>>>>>>>>>>>>> agreement in final form is executed in writing by all parties involved.*
>>>>>>>>>>>>>>
>>>>>>>>>>>>>

Re: Fun with WebAssembly transforms

Posted by Luke Cwik via dev <de...@beam.apache.org>.
First we'll want to choose whether we want to target Wasm, WASI or Wagi.
WASI adds a lot of simple things like access to a clock, random number
generator, ... that would expand the scope of what transpiled code can do.
It is debatable whether we'll want the power to run the transpiled code as
a microservice. Using UDFs for XLang and UDFs and UDAFs for SQL as our
expected use cases seem to make WASI the best choice. The issue is in the
details as there is a hodgepodge of what language runtimes support and what
are the limits of transpiling from a language to WebAssembly.

Assuming WASI then it breaks down to these two aspects:
1) Does the host language have a runtime?
Java: https://github.com/wasmerio/wasmer-java
Python: https://github.com/wasmerio/wasmer-python
Go: https://github.com/wasmerio/wasmer-go

2) How good is compilation from source language to WebAssembly
<https://github.com/appcypher/awesome-wasm-langs>?
Java (very limited):
Issues with garbage collection and the need to transpile/replace much of
the VM's capabilities plus the large standard library that everyone uses
causes a lot of challenges.
JWebAssembly can do simple things like basic classes, strings, method
calls. Should be able to compile trivial lambdas to Wasm. There are other
choices but to my knowledge all are very limited.

Python <https://pythondev.readthedocs.io/wasm.html> (quite good):
Features CPython Emscripten browser CPython Emscripten node Pyodide
subprocess (fork, exec) no no no
threads no YES WIP
file system no (only MEMFS) YES (Node raw FS) YES (IDB, Node, …)
shared extension modules WIP WIP YES
PyPI packages no no YES
sockets ? ? ?
urllib, asyncio no no WebAPI fetch / WebSocket
signals no WIP YES

Go (excellent): Native support in go compiler

On Tue, Jul 12, 2022 at 5:51 PM Chamikara Jayalath via dev <
dev@beam.apache.org> wrote:

>
>
> On Wed, Jun 29, 2022 at 9:31 AM Luke Cwik <lc...@google.com> wrote:
>
>> I have had interest in integrating Wasm within Beam as well as I have had
>> a lot of interest in improving language portability.
>>
>> Wasm has a lot of benefits over using docker containers to provide a
>> place for code to execute. From experience implementing working on the
>> Beam's portability layer and internal Flume knowledge:
>> * encoding and decoding data is expensive, anything which ensures that
>> in-memory representations for data being transferred from the host to the
>> guest and back without transcoding/re-interpreting will be a big win.
>> * reducing the amount of times we need to pass data between guest and
>> host and back is important
>>   * fusing transforms reduces the number of data passing points
>>   * batching (row or columnar) data reduces the amount of times we need
>> to pass data at each data passing point
>> * there are enough complicated use cases (state & timers, large
>> iterables, side inputs) where handling the trivial map/flatmap usecase will
>> provide little value since it will prevent fusion
>>
>> I have been meaning to work on a prototype where we replace the current
>> gRPC + docker path with one in which we use Wasm to execute a fused graph
>> re-using large parts of the existing code base written to support
>> portability.
>>
>
> This sounds very interesting. Probably using Wasm to implement proper UDF
> support for x-lang (for example, executing Python timestamp/watermark
> functions provided through the Kafka Python x-lang wrapper on the Java
> Kafka transform) will be a good first target ? My main question for this at
> this point is whether Wasm has adequate support for existing SDKs that use
> x-lang to implement this in a useful way.
>
> Thanks,
> Cham
>
>
>>
>>
>> On Fri, Jun 17, 2022 at 2:19 PM Brian Hulette <bh...@google.com>
>> wrote:
>>
>>> Re: Arrow - it's long been my dream to use Arrow for interchange in Beam
>>> [1]. I'm trying to move us in that direction with
>>> https://s.apache.org/batched-dofns (arrow is discussed briefly in the
>>> Future Work section). This gives the Python SDK a concept of batches of
>>> logical elements. My goal is Beam schemas + batches of logical elements ->
>>> Arrow RecordBatches.
>>>
>>> The Batched DoFn infrastructure is stable as of the 2.40.0 release cut
>>> and I'm currently working on adding what I'm calling a "BatchConverter" [2]
>>> for Beam Rows -> Arrow RecordBatch. Once that's done it could be
>>> interesting to experiment with a "WasmDoFn" that uses Arrow for interchange.
>>>
>>> Brian
>>>
>>> [1]
>>> https://docs.google.com/presentation/d/1D9vigwYTCuAuz_CO8nex3GK3h873acmQJE5Ui8TFsDY/edit#slide=id.g608e662464_0_160
>>> [2]
>>> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/typehints/batch.py
>>>
>>>
>>> On Thu, Jun 16, 2022 at 10:55 AM Sean Jensen-Grey <je...@google.com>
>>> wrote:
>>>
>>>> Interesting.
>>>>
>>>> Robert, I was just served an ad for Redpanda when I searched for
>>>> "golang wasm" :)
>>>>
>>>> The storage and execution grid systems are all embracing wasm in some
>>>> way.
>>>>
>>>> https://redpanda.com/
>>>> https://www.fluvio.io/
>>>> https://temporal.io/ (Cadence fork by the Cadence folks, I met Maxim
>>>> the lead at Temporal at the 2020 Wasm Summit)
>>>> https://github.com/pachyderm/pachyderm no mention of wasm, yet.
>>>>
>>>> Keep the Wasm+Beam demos coming.
>>>>
>>>> Sean
>>>>
>>>>
>>>>
>>>> On Thu, Jun 16, 2022 at 4:23 AM Steven van Rossum <
>>>> sjvanrossum@google.com> wrote:
>>>>
>>>>> I caught up with all the replies through the web interface, but I
>>>>> didn't have my list subscription set up correctly so my reply (TL;DR sample
>>>>> code available at https://github.com/sjvanrossum/beam-wasm) didn't
>>>>> come through until a bit later yesterday I think.
>>>>>
>>>>> Sean, I agree with your suggestion of Arrow as the interchange format
>>>>> for Wasm transforms and it's something I thought about exploring when I was
>>>>> adding serialization/deserialization of complex (meaning anything that's
>>>>> not an integer or float in the context of Wasm) data types in the demo.
>>>>> It's an unfortunate bit of overhead which could very well be solved with
>>>>> Arrow and shared memory between Wasm modules.
>>>>> I've seen Wasm transforms pop up in a few other places, notably in
>>>>> streaming data platforms like Fluvio and Redpanda and they seem to incur
>>>>> the same overhead when moving data into and out of the guest context so
>>>>> maybe it's negligible, but I haven't done any serious benchmark yet to
>>>>> validate that.
>>>>>
>>>>> Regards,
>>>>>
>>>>> Steve
>>>>>
>>>>> On Thu, Jun 16, 2022 at 3:04 AM Robert Burke <ro...@frantil.com>
>>>>> wrote:
>>>>>
>>>>>> Obligatory mention that WASM is basically an architecture that any
>>>>>> well meaning compiler can target, eg the Go compiler
>>>>>>
>>>>>>
>>>>>> https://www.bradcypert.com/an-introduction-to-targeting-web-assembly-with-golang/
>>>>>>
>>>>>> (Among many articles for the last few years)
>>>>>>
>>>>>> Robert Burke
>>>>>> Beam Go Busybody
>>>>>>
>>>>>> On Wed, Jun 15, 2022, 2:04 PM Sean Jensen-Grey <je...@google.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Heh, my stage fright was so strong, I didn't realize that the talk
>>>>>>> was recorded. :)
>>>>>>>
>>>>>>> Steven, I'd love to chat about Wasm in Beam. This email is a bit
>>>>>>> rough.
>>>>>>>
>>>>>>> I haven't explored Wasm in Beam much since that talk. I think the
>>>>>>> most compelling use is in the portability of logic between data processing
>>>>>>> systems. Esp in the use of probabilistic data structures like Bloom
>>>>>>> Filters, Count-Min-Sketch, HyperLogLog, where it is nice to persist the
>>>>>>> data structure and use it on a different system. Like generating a bloom
>>>>>>> filter in Beam and using it inside of a BQ query w/o having to reimplement
>>>>>>> and test across many platforms.
>>>>>>>
>>>>>>> I have used Wasm in BQ, as BQ UDFs are driven by V8. Anywhere V8
>>>>>>> exists, Wasm support exists for free unless the embedder goes out of their
>>>>>>> way to disable it. So it is supported in Deno/Node as well. In Python, Wasm
>>>>>>> support via Wasmtime <https://github.com/bytecodealliance/wasmtime>
>>>>>>> is really good.  There are *many* options for execution environments, one
>>>>>>> of the downsides of passing through JS one is in string and number
>>>>>>> support(float/int64) issues, afaik. I could be wrong, maybe JS has fixed
>>>>>>> all this by now.
>>>>>>>
>>>>>>> The qualities in order of importance (for me) are
>>>>>>>
>>>>>>>    1. Portability, run the same code everywhere
>>>>>>>    2. Security, memory safety for the caller. Running Wasm inside
>>>>>>>    of Python should never crash your Python interpreter. The capability model
>>>>>>>    ensures that the Wasm module can only do what you allow it to
>>>>>>>    3. Performance (portable), compile once and run everywhere
>>>>>>>    within some margin of native.  Python makes this look good :)
>>>>>>>
>>>>>>> I think something worth exploring is moving opaque-ish Arrow objects
>>>>>>> around via Beam, so that Beam is now mostly in the control plane and
>>>>>>> computation happens in Wasm, this should reduce the serialization overhead
>>>>>>> and also get Python out of the datapath.
>>>>>>>
>>>>>>> I see someone exploring Wasm+Arrow here,
>>>>>>> https://github.com/domoritz/arrow-wasm
>>>>>>>
>>>>>>> Another possibly interesting avenue to explore is compiling command
>>>>>>> line programs to Wasi (WebAssembly System Interface), the POSIX like shim,
>>>>>>> so that they can be run inprocess without the fork/exec/pipe overhead of
>>>>>>> running a subprocess. A neat demo might be running something like Jq
>>>>>>> <https://stedolan.github.io/jq/> inside of a Beam job.
>>>>>>>
>>>>>>> Not to make Wasm sound like a Python only technology, it can be used
>>>>>>> via Java/JVM via
>>>>>>>
>>>>>>>    - https://www.graalvm.org/22.1/reference-manual/wasm/
>>>>>>>    - https://github.com/kawamuray/wasmtime-java
>>>>>>>
>>>>>>> Sean
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Jun 15, 2022 at 9:35 AM Pablo Estrada <pa...@google.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> adding Steven in case he didn't get the replies : )
>>>>>>>>
>>>>>>>> On Wed, Jun 15, 2022 at 9:29 AM Daniel Collins <
>>>>>>>> dpcollins@google.com> wrote:
>>>>>>>>
>>>>>>>>> If we ever do anything with the JS runtime, this would seem to be
>>>>>>>>> the best place to run WASM.
>>>>>>>>>
>>>>>>>>> On Tue, Jun 14, 2022 at 8:13 PM Brian Hulette <bh...@google.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> FYI: @Sean Jensen-Grey <je...@google.com> gave a talk back
>>>>>>>>>> in 2020 where he had integrated Rust with the Python SDK. I thought he used
>>>>>>>>>> WebAssembly for that, but it looks like he used some other approaches, and
>>>>>>>>>> his talk mentioned WebAssembly as future work. Not sure if that was ever
>>>>>>>>>> explored.
>>>>>>>>>>
>>>>>>>>>> https://www.youtube.com/watch?v=fZK_Tiu7q1o
>>>>>>>>>> https://github.com/seanjensengrey/beam-rust-python-java
>>>>>>>>>>
>>>>>>>>>> Brian
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, Jun 14, 2022 at 5:05 PM Ahmet Altay <al...@google.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Adding @Lukasz Cwik <lc...@google.com> - he was interested in
>>>>>>>>>>> the WebAssembly topic.
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Jun 14, 2022 at 3:09 PM Pablo Estrada <
>>>>>>>>>>> pabloem@google.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Would you open a pull request for it? Or at least share a
>>>>>>>>>>>> branch? : )
>>>>>>>>>>>> Even if we don't want to merge it, it would be great to have a
>>>>>>>>>>>> PR as a way to showcase the work, its usefulness, and receive comments on
>>>>>>>>>>>> this thread once we can see something more specific.
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Jun 14, 2022 at 3:05 PM Steven van Rossum <
>>>>>>>>>>>> sjvanrossum@google.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi folks,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I had some spare time yesterday and thought it'd be fun to
>>>>>>>>>>>>> implement a transform which runs WebAssembly modules as a lightweight way
>>>>>>>>>>>>> to implement cross language transforms for languages which don't (yet) have
>>>>>>>>>>>>> a SDK implementation.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I've got a small proof of concept running in the Python SDK as
>>>>>>>>>>>>> a DoFn with Wasmer as the WebAssembly runtime and simple support for
>>>>>>>>>>>>> marshalling between the host and guest environment with the RowCoder. The
>>>>>>>>>>>>> module I've constructed is mostly useless, but demonstrates the host
>>>>>>>>>>>>> copying the encoded element into the guest's memory, the guest copying
>>>>>>>>>>>>> those bytes elsewhere in its linear memory buffer, the guest calling back
>>>>>>>>>>>>> to the host with the offset and size and the host copying and decoding from
>>>>>>>>>>>>> the guest's memory.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Any thoughts/interest? I'm not sure where I was going with
>>>>>>>>>>>>> this, since it was mostly just a "wouldn't it be cool if..." on a Monday
>>>>>>>>>>>>> afternoon, but I can see a few use cases for this.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Steve
>>>>>>>>>>>>>
>>>>>>>>>>>>> Steven van Rossum |  Strategic Cloud Engineer |
>>>>>>>>>>>>> sjvanrossum@google.com |  (+31) (0)6 21174069
>>>>>>>>>>>>> <+31%206%2021174069>
>>>>>>>>>>>>>
>>>>>>>>>>>>> *Google Netherlands B.V.*
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> *Reg: Claude Debussylaan 34 15th floor, 1082 MD
>>>>>>>>>>>>> Amsterdam34198589NETHERLANDSVAT / Tax ID:- 812788515 B01*
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> *If you received this communication by mistake, please don't
>>>>>>>>>>>>> forward it to anyone else (it may contain confidential or privileged
>>>>>>>>>>>>> information), please erase all copies of it, including all attachments, and
>>>>>>>>>>>>> please let the sender know it went to the wrong person. Thanks.*
>>>>>>>>>>>>>
>>>>>>>>>>>>> *The above terms reflect a potential business arrangement, are
>>>>>>>>>>>>> provided solely as a basis for further discussion, and are not intended to
>>>>>>>>>>>>> be and do not constitute a legally binding obligation. No legally binding
>>>>>>>>>>>>> obligations will be created, implied, or inferred until an agreement in
>>>>>>>>>>>>> final form is executed in writing by all parties involved.*
>>>>>>>>>>>>>
>>>>>>>>>>>>

Re: Fun with WebAssembly transforms

Posted by Chamikara Jayalath via dev <de...@beam.apache.org>.
On Wed, Jun 29, 2022 at 9:31 AM Luke Cwik <lc...@google.com> wrote:

> I have had interest in integrating Wasm within Beam as well as I have had
> a lot of interest in improving language portability.
>
> Wasm has a lot of benefits over using docker containers to provide a place
> for code to execute. From experience implementing working on the Beam's
> portability layer and internal Flume knowledge:
> * encoding and decoding data is expensive, anything which ensures that
> in-memory representations for data being transferred from the host to the
> guest and back without transcoding/re-interpreting will be a big win.
> * reducing the amount of times we need to pass data between guest and host
> and back is important
>   * fusing transforms reduces the number of data passing points
>   * batching (row or columnar) data reduces the amount of times we need to
> pass data at each data passing point
> * there are enough complicated use cases (state & timers, large iterables,
> side inputs) where handling the trivial map/flatmap usecase will provide
> little value since it will prevent fusion
>
> I have been meaning to work on a prototype where we replace the current
> gRPC + docker path with one in which we use Wasm to execute a fused graph
> re-using large parts of the existing code base written to support
> portability.
>

This sounds very interesting. Probably using Wasm to implement proper UDF
support for x-lang (for example, executing Python timestamp/watermark
functions provided through the Kafka Python x-lang wrapper on the Java
Kafka transform) will be a good first target ? My main question for this at
this point is whether Wasm has adequate support for existing SDKs that use
x-lang to implement this in a useful way.

Thanks,
Cham


>
>
> On Fri, Jun 17, 2022 at 2:19 PM Brian Hulette <bh...@google.com> wrote:
>
>> Re: Arrow - it's long been my dream to use Arrow for interchange in Beam
>> [1]. I'm trying to move us in that direction with
>> https://s.apache.org/batched-dofns (arrow is discussed briefly in the
>> Future Work section). This gives the Python SDK a concept of batches of
>> logical elements. My goal is Beam schemas + batches of logical elements ->
>> Arrow RecordBatches.
>>
>> The Batched DoFn infrastructure is stable as of the 2.40.0 release cut
>> and I'm currently working on adding what I'm calling a "BatchConverter" [2]
>> for Beam Rows -> Arrow RecordBatch. Once that's done it could be
>> interesting to experiment with a "WasmDoFn" that uses Arrow for interchange.
>>
>> Brian
>>
>> [1]
>> https://docs.google.com/presentation/d/1D9vigwYTCuAuz_CO8nex3GK3h873acmQJE5Ui8TFsDY/edit#slide=id.g608e662464_0_160
>> [2]
>> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/typehints/batch.py
>>
>>
>> On Thu, Jun 16, 2022 at 10:55 AM Sean Jensen-Grey <je...@google.com>
>> wrote:
>>
>>> Interesting.
>>>
>>> Robert, I was just served an ad for Redpanda when I searched for "golang
>>> wasm" :)
>>>
>>> The storage and execution grid systems are all embracing wasm in some
>>> way.
>>>
>>> https://redpanda.com/
>>> https://www.fluvio.io/
>>> https://temporal.io/ (Cadence fork by the Cadence folks, I met Maxim
>>> the lead at Temporal at the 2020 Wasm Summit)
>>> https://github.com/pachyderm/pachyderm no mention of wasm, yet.
>>>
>>> Keep the Wasm+Beam demos coming.
>>>
>>> Sean
>>>
>>>
>>>
>>> On Thu, Jun 16, 2022 at 4:23 AM Steven van Rossum <
>>> sjvanrossum@google.com> wrote:
>>>
>>>> I caught up with all the replies through the web interface, but I
>>>> didn't have my list subscription set up correctly so my reply (TL;DR sample
>>>> code available at https://github.com/sjvanrossum/beam-wasm) didn't
>>>> come through until a bit later yesterday I think.
>>>>
>>>> Sean, I agree with your suggestion of Arrow as the interchange format
>>>> for Wasm transforms and it's something I thought about exploring when I was
>>>> adding serialization/deserialization of complex (meaning anything that's
>>>> not an integer or float in the context of Wasm) data types in the demo.
>>>> It's an unfortunate bit of overhead which could very well be solved with
>>>> Arrow and shared memory between Wasm modules.
>>>> I've seen Wasm transforms pop up in a few other places, notably in
>>>> streaming data platforms like Fluvio and Redpanda and they seem to incur
>>>> the same overhead when moving data into and out of the guest context so
>>>> maybe it's negligible, but I haven't done any serious benchmark yet to
>>>> validate that.
>>>>
>>>> Regards,
>>>>
>>>> Steve
>>>>
>>>> On Thu, Jun 16, 2022 at 3:04 AM Robert Burke <ro...@frantil.com>
>>>> wrote:
>>>>
>>>>> Obligatory mention that WASM is basically an architecture that any
>>>>> well meaning compiler can target, eg the Go compiler
>>>>>
>>>>>
>>>>> https://www.bradcypert.com/an-introduction-to-targeting-web-assembly-with-golang/
>>>>>
>>>>> (Among many articles for the last few years)
>>>>>
>>>>> Robert Burke
>>>>> Beam Go Busybody
>>>>>
>>>>> On Wed, Jun 15, 2022, 2:04 PM Sean Jensen-Grey <je...@google.com>
>>>>> wrote:
>>>>>
>>>>>> Heh, my stage fright was so strong, I didn't realize that the talk
>>>>>> was recorded. :)
>>>>>>
>>>>>> Steven, I'd love to chat about Wasm in Beam. This email is a bit
>>>>>> rough.
>>>>>>
>>>>>> I haven't explored Wasm in Beam much since that talk. I think the
>>>>>> most compelling use is in the portability of logic between data processing
>>>>>> systems. Esp in the use of probabilistic data structures like Bloom
>>>>>> Filters, Count-Min-Sketch, HyperLogLog, where it is nice to persist the
>>>>>> data structure and use it on a different system. Like generating a bloom
>>>>>> filter in Beam and using it inside of a BQ query w/o having to reimplement
>>>>>> and test across many platforms.
>>>>>>
>>>>>> I have used Wasm in BQ, as BQ UDFs are driven by V8. Anywhere V8
>>>>>> exists, Wasm support exists for free unless the embedder goes out of their
>>>>>> way to disable it. So it is supported in Deno/Node as well. In Python, Wasm
>>>>>> support via Wasmtime <https://github.com/bytecodealliance/wasmtime>
>>>>>> is really good.  There are *many* options for execution environments, one
>>>>>> of the downsides of passing through JS one is in string and number
>>>>>> support(float/int64) issues, afaik. I could be wrong, maybe JS has fixed
>>>>>> all this by now.
>>>>>>
>>>>>> The qualities in order of importance (for me) are
>>>>>>
>>>>>>    1. Portability, run the same code everywhere
>>>>>>    2. Security, memory safety for the caller. Running Wasm inside of
>>>>>>    Python should never crash your Python interpreter. The capability model
>>>>>>    ensures that the Wasm module can only do what you allow it to
>>>>>>    3. Performance (portable), compile once and run everywhere within
>>>>>>    some margin of native.  Python makes this look good :)
>>>>>>
>>>>>> I think something worth exploring is moving opaque-ish Arrow objects
>>>>>> around via Beam, so that Beam is now mostly in the control plane and
>>>>>> computation happens in Wasm, this should reduce the serialization overhead
>>>>>> and also get Python out of the datapath.
>>>>>>
>>>>>> I see someone exploring Wasm+Arrow here,
>>>>>> https://github.com/domoritz/arrow-wasm
>>>>>>
>>>>>> Another possibly interesting avenue to explore is compiling command
>>>>>> line programs to Wasi (WebAssembly System Interface), the POSIX like shim,
>>>>>> so that they can be run inprocess without the fork/exec/pipe overhead of
>>>>>> running a subprocess. A neat demo might be running something like Jq
>>>>>> <https://stedolan.github.io/jq/> inside of a Beam job.
>>>>>>
>>>>>> Not to make Wasm sound like a Python only technology, it can be used
>>>>>> via Java/JVM via
>>>>>>
>>>>>>    - https://www.graalvm.org/22.1/reference-manual/wasm/
>>>>>>    - https://github.com/kawamuray/wasmtime-java
>>>>>>
>>>>>> Sean
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Jun 15, 2022 at 9:35 AM Pablo Estrada <pa...@google.com>
>>>>>> wrote:
>>>>>>
>>>>>>> adding Steven in case he didn't get the replies : )
>>>>>>>
>>>>>>> On Wed, Jun 15, 2022 at 9:29 AM Daniel Collins <dp...@google.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> If we ever do anything with the JS runtime, this would seem to be
>>>>>>>> the best place to run WASM.
>>>>>>>>
>>>>>>>> On Tue, Jun 14, 2022 at 8:13 PM Brian Hulette <bh...@google.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> FYI: @Sean Jensen-Grey <je...@google.com> gave a talk back
>>>>>>>>> in 2020 where he had integrated Rust with the Python SDK. I thought he used
>>>>>>>>> WebAssembly for that, but it looks like he used some other approaches, and
>>>>>>>>> his talk mentioned WebAssembly as future work. Not sure if that was ever
>>>>>>>>> explored.
>>>>>>>>>
>>>>>>>>> https://www.youtube.com/watch?v=fZK_Tiu7q1o
>>>>>>>>> https://github.com/seanjensengrey/beam-rust-python-java
>>>>>>>>>
>>>>>>>>> Brian
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Jun 14, 2022 at 5:05 PM Ahmet Altay <al...@google.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Adding @Lukasz Cwik <lc...@google.com> - he was interested in
>>>>>>>>>> the WebAssembly topic.
>>>>>>>>>>
>>>>>>>>>> On Tue, Jun 14, 2022 at 3:09 PM Pablo Estrada <pa...@google.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Would you open a pull request for it? Or at least share a
>>>>>>>>>>> branch? : )
>>>>>>>>>>> Even if we don't want to merge it, it would be great to have a
>>>>>>>>>>> PR as a way to showcase the work, its usefulness, and receive comments on
>>>>>>>>>>> this thread once we can see something more specific.
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Jun 14, 2022 at 3:05 PM Steven van Rossum <
>>>>>>>>>>> sjvanrossum@google.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi folks,
>>>>>>>>>>>>
>>>>>>>>>>>> I had some spare time yesterday and thought it'd be fun to
>>>>>>>>>>>> implement a transform which runs WebAssembly modules as a lightweight way
>>>>>>>>>>>> to implement cross language transforms for languages which don't (yet) have
>>>>>>>>>>>> a SDK implementation.
>>>>>>>>>>>>
>>>>>>>>>>>> I've got a small proof of concept running in the Python SDK as
>>>>>>>>>>>> a DoFn with Wasmer as the WebAssembly runtime and simple support for
>>>>>>>>>>>> marshalling between the host and guest environment with the RowCoder. The
>>>>>>>>>>>> module I've constructed is mostly useless, but demonstrates the host
>>>>>>>>>>>> copying the encoded element into the guest's memory, the guest copying
>>>>>>>>>>>> those bytes elsewhere in its linear memory buffer, the guest calling back
>>>>>>>>>>>> to the host with the offset and size and the host copying and decoding from
>>>>>>>>>>>> the guest's memory.
>>>>>>>>>>>>
>>>>>>>>>>>> Any thoughts/interest? I'm not sure where I was going with
>>>>>>>>>>>> this, since it was mostly just a "wouldn't it be cool if..." on a Monday
>>>>>>>>>>>> afternoon, but I can see a few use cases for this.
>>>>>>>>>>>>
>>>>>>>>>>>> Regards,
>>>>>>>>>>>>
>>>>>>>>>>>> Steve
>>>>>>>>>>>>
>>>>>>>>>>>> Steven van Rossum |  Strategic Cloud Engineer |
>>>>>>>>>>>> sjvanrossum@google.com |  (+31) (0)6 21174069
>>>>>>>>>>>> <+31%206%2021174069>
>>>>>>>>>>>>
>>>>>>>>>>>> *Google Netherlands B.V.*
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> *Reg: Claude Debussylaan 34 15th floor, 1082 MD
>>>>>>>>>>>> Amsterdam34198589NETHERLANDSVAT / Tax ID:- 812788515 B01*
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> *If you received this communication by mistake, please don't
>>>>>>>>>>>> forward it to anyone else (it may contain confidential or privileged
>>>>>>>>>>>> information), please erase all copies of it, including all attachments, and
>>>>>>>>>>>> please let the sender know it went to the wrong person. Thanks.*
>>>>>>>>>>>>
>>>>>>>>>>>> *The above terms reflect a potential business arrangement, are
>>>>>>>>>>>> provided solely as a basis for further discussion, and are not intended to
>>>>>>>>>>>> be and do not constitute a legally binding obligation. No legally binding
>>>>>>>>>>>> obligations will be created, implied, or inferred until an agreement in
>>>>>>>>>>>> final form is executed in writing by all parties involved.*
>>>>>>>>>>>>
>>>>>>>>>>>

Re: Fun with WebAssembly transforms

Posted by Luke Cwik <lc...@google.com>.
I have had interest in integrating Wasm within Beam as well as I have had a
lot of interest in improving language portability.

Wasm has a lot of benefits over using docker containers to provide a place
for code to execute. From experience implementing working on the Beam's
portability layer and internal Flume knowledge:
* encoding and decoding data is expensive, anything which ensures that
in-memory representations for data being transferred from the host to the
guest and back without transcoding/re-interpreting will be a big win.
* reducing the amount of times we need to pass data between guest and host
and back is important
  * fusing transforms reduces the number of data passing points
  * batching (row or columnar) data reduces the amount of times we need to
pass data at each data passing point
* there are enough complicated use cases (state & timers, large iterables,
side inputs) where handling the trivial map/flatmap usecase will provide
little value since it will prevent fusion

I have been meaning to work on a prototype where we replace the current
gRPC + docker path with one in which we use Wasm to execute a fused graph
re-using large parts of the existing code base written to support
portability.


On Fri, Jun 17, 2022 at 2:19 PM Brian Hulette <bh...@google.com> wrote:

> Re: Arrow - it's long been my dream to use Arrow for interchange in Beam
> [1]. I'm trying to move us in that direction with
> https://s.apache.org/batched-dofns (arrow is discussed briefly in the
> Future Work section). This gives the Python SDK a concept of batches of
> logical elements. My goal is Beam schemas + batches of logical elements ->
> Arrow RecordBatches.
>
> The Batched DoFn infrastructure is stable as of the 2.40.0 release cut and
> I'm currently working on adding what I'm calling a "BatchConverter" [2] for
> Beam Rows -> Arrow RecordBatch. Once that's done it could be interesting to
> experiment with a "WasmDoFn" that uses Arrow for interchange.
>
> Brian
>
> [1]
> https://docs.google.com/presentation/d/1D9vigwYTCuAuz_CO8nex3GK3h873acmQJE5Ui8TFsDY/edit#slide=id.g608e662464_0_160
> [2]
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/typehints/batch.py
>
>
> On Thu, Jun 16, 2022 at 10:55 AM Sean Jensen-Grey <je...@google.com>
> wrote:
>
>> Interesting.
>>
>> Robert, I was just served an ad for Redpanda when I searched for "golang
>> wasm" :)
>>
>> The storage and execution grid systems are all embracing wasm in some way.
>>
>> https://redpanda.com/
>> https://www.fluvio.io/
>> https://temporal.io/ (Cadence fork by the Cadence folks, I met Maxim the
>> lead at Temporal at the 2020 Wasm Summit)
>> https://github.com/pachyderm/pachyderm no mention of wasm, yet.
>>
>> Keep the Wasm+Beam demos coming.
>>
>> Sean
>>
>>
>>
>> On Thu, Jun 16, 2022 at 4:23 AM Steven van Rossum <sj...@google.com>
>> wrote:
>>
>>> I caught up with all the replies through the web interface, but I didn't
>>> have my list subscription set up correctly so my reply (TL;DR sample code
>>> available at https://github.com/sjvanrossum/beam-wasm) didn't come
>>> through until a bit later yesterday I think.
>>>
>>> Sean, I agree with your suggestion of Arrow as the interchange format
>>> for Wasm transforms and it's something I thought about exploring when I was
>>> adding serialization/deserialization of complex (meaning anything that's
>>> not an integer or float in the context of Wasm) data types in the demo.
>>> It's an unfortunate bit of overhead which could very well be solved with
>>> Arrow and shared memory between Wasm modules.
>>> I've seen Wasm transforms pop up in a few other places, notably in
>>> streaming data platforms like Fluvio and Redpanda and they seem to incur
>>> the same overhead when moving data into and out of the guest context so
>>> maybe it's negligible, but I haven't done any serious benchmark yet to
>>> validate that.
>>>
>>> Regards,
>>>
>>> Steve
>>>
>>> On Thu, Jun 16, 2022 at 3:04 AM Robert Burke <ro...@frantil.com> wrote:
>>>
>>>> Obligatory mention that WASM is basically an architecture that any well
>>>> meaning compiler can target, eg the Go compiler
>>>>
>>>>
>>>> https://www.bradcypert.com/an-introduction-to-targeting-web-assembly-with-golang/
>>>>
>>>> (Among many articles for the last few years)
>>>>
>>>> Robert Burke
>>>> Beam Go Busybody
>>>>
>>>> On Wed, Jun 15, 2022, 2:04 PM Sean Jensen-Grey <je...@google.com>
>>>> wrote:
>>>>
>>>>> Heh, my stage fright was so strong, I didn't realize that the talk was
>>>>> recorded. :)
>>>>>
>>>>> Steven, I'd love to chat about Wasm in Beam. This email is a bit
>>>>> rough.
>>>>>
>>>>> I haven't explored Wasm in Beam much since that talk. I think the most
>>>>> compelling use is in the portability of logic between data processing
>>>>> systems. Esp in the use of probabilistic data structures like Bloom
>>>>> Filters, Count-Min-Sketch, HyperLogLog, where it is nice to persist the
>>>>> data structure and use it on a different system. Like generating a bloom
>>>>> filter in Beam and using it inside of a BQ query w/o having to reimplement
>>>>> and test across many platforms.
>>>>>
>>>>> I have used Wasm in BQ, as BQ UDFs are driven by V8. Anywhere V8
>>>>> exists, Wasm support exists for free unless the embedder goes out of their
>>>>> way to disable it. So it is supported in Deno/Node as well. In Python, Wasm
>>>>> support via Wasmtime <https://github.com/bytecodealliance/wasmtime>
>>>>> is really good.  There are *many* options for execution environments, one
>>>>> of the downsides of passing through JS one is in string and number
>>>>> support(float/int64) issues, afaik. I could be wrong, maybe JS has fixed
>>>>> all this by now.
>>>>>
>>>>> The qualities in order of importance (for me) are
>>>>>
>>>>>    1. Portability, run the same code everywhere
>>>>>    2. Security, memory safety for the caller. Running Wasm inside of
>>>>>    Python should never crash your Python interpreter. The capability model
>>>>>    ensures that the Wasm module can only do what you allow it to
>>>>>    3. Performance (portable), compile once and run everywhere within
>>>>>    some margin of native.  Python makes this look good :)
>>>>>
>>>>> I think something worth exploring is moving opaque-ish Arrow objects
>>>>> around via Beam, so that Beam is now mostly in the control plane and
>>>>> computation happens in Wasm, this should reduce the serialization overhead
>>>>> and also get Python out of the datapath.
>>>>>
>>>>> I see someone exploring Wasm+Arrow here,
>>>>> https://github.com/domoritz/arrow-wasm
>>>>>
>>>>> Another possibly interesting avenue to explore is compiling command
>>>>> line programs to Wasi (WebAssembly System Interface), the POSIX like shim,
>>>>> so that they can be run inprocess without the fork/exec/pipe overhead of
>>>>> running a subprocess. A neat demo might be running something like Jq
>>>>> <https://stedolan.github.io/jq/> inside of a Beam job.
>>>>>
>>>>> Not to make Wasm sound like a Python only technology, it can be used
>>>>> via Java/JVM via
>>>>>
>>>>>    - https://www.graalvm.org/22.1/reference-manual/wasm/
>>>>>    - https://github.com/kawamuray/wasmtime-java
>>>>>
>>>>> Sean
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Jun 15, 2022 at 9:35 AM Pablo Estrada <pa...@google.com>
>>>>> wrote:
>>>>>
>>>>>> adding Steven in case he didn't get the replies : )
>>>>>>
>>>>>> On Wed, Jun 15, 2022 at 9:29 AM Daniel Collins <dp...@google.com>
>>>>>> wrote:
>>>>>>
>>>>>>> If we ever do anything with the JS runtime, this would seem to be
>>>>>>> the best place to run WASM.
>>>>>>>
>>>>>>> On Tue, Jun 14, 2022 at 8:13 PM Brian Hulette <bh...@google.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> FYI: @Sean Jensen-Grey <je...@google.com> gave a talk back in
>>>>>>>> 2020 where he had integrated Rust with the Python SDK. I thought he used
>>>>>>>> WebAssembly for that, but it looks like he used some other approaches, and
>>>>>>>> his talk mentioned WebAssembly as future work. Not sure if that was ever
>>>>>>>> explored.
>>>>>>>>
>>>>>>>> https://www.youtube.com/watch?v=fZK_Tiu7q1o
>>>>>>>> https://github.com/seanjensengrey/beam-rust-python-java
>>>>>>>>
>>>>>>>> Brian
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Jun 14, 2022 at 5:05 PM Ahmet Altay <al...@google.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Adding @Lukasz Cwik <lc...@google.com> - he was interested in the
>>>>>>>>> WebAssembly topic.
>>>>>>>>>
>>>>>>>>> On Tue, Jun 14, 2022 at 3:09 PM Pablo Estrada <pa...@google.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Would you open a pull request for it? Or at least share a branch?
>>>>>>>>>> : )
>>>>>>>>>> Even if we don't want to merge it, it would be great to have a PR
>>>>>>>>>> as a way to showcase the work, its usefulness, and receive comments on this
>>>>>>>>>> thread once we can see something more specific.
>>>>>>>>>>
>>>>>>>>>> On Tue, Jun 14, 2022 at 3:05 PM Steven van Rossum <
>>>>>>>>>> sjvanrossum@google.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi folks,
>>>>>>>>>>>
>>>>>>>>>>> I had some spare time yesterday and thought it'd be fun to
>>>>>>>>>>> implement a transform which runs WebAssembly modules as a lightweight way
>>>>>>>>>>> to implement cross language transforms for languages which don't (yet) have
>>>>>>>>>>> a SDK implementation.
>>>>>>>>>>>
>>>>>>>>>>> I've got a small proof of concept running in the Python SDK as a
>>>>>>>>>>> DoFn with Wasmer as the WebAssembly runtime and simple support for
>>>>>>>>>>> marshalling between the host and guest environment with the RowCoder. The
>>>>>>>>>>> module I've constructed is mostly useless, but demonstrates the host
>>>>>>>>>>> copying the encoded element into the guest's memory, the guest copying
>>>>>>>>>>> those bytes elsewhere in its linear memory buffer, the guest calling back
>>>>>>>>>>> to the host with the offset and size and the host copying and decoding from
>>>>>>>>>>> the guest's memory.
>>>>>>>>>>>
>>>>>>>>>>> Any thoughts/interest? I'm not sure where I was going with this,
>>>>>>>>>>> since it was mostly just a "wouldn't it be cool if..." on a Monday
>>>>>>>>>>> afternoon, but I can see a few use cases for this.
>>>>>>>>>>>
>>>>>>>>>>> Regards,
>>>>>>>>>>>
>>>>>>>>>>> Steve
>>>>>>>>>>>
>>>>>>>>>>> Steven van Rossum |  Strategic Cloud Engineer |
>>>>>>>>>>> sjvanrossum@google.com |  (+31) (0)6 21174069
>>>>>>>>>>> <+31%206%2021174069>
>>>>>>>>>>>
>>>>>>>>>>> *Google Netherlands B.V.*
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> *Reg: Claude Debussylaan 34 15th floor, 1082 MD
>>>>>>>>>>> Amsterdam34198589NETHERLANDSVAT / Tax ID:- 812788515 B01*
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> *If you received this communication by mistake, please don't
>>>>>>>>>>> forward it to anyone else (it may contain confidential or privileged
>>>>>>>>>>> information), please erase all copies of it, including all attachments, and
>>>>>>>>>>> please let the sender know it went to the wrong person. Thanks.*
>>>>>>>>>>>
>>>>>>>>>>> *The above terms reflect a potential business arrangement, are
>>>>>>>>>>> provided solely as a basis for further discussion, and are not intended to
>>>>>>>>>>> be and do not constitute a legally binding obligation. No legally binding
>>>>>>>>>>> obligations will be created, implied, or inferred until an agreement in
>>>>>>>>>>> final form is executed in writing by all parties involved.*
>>>>>>>>>>>
>>>>>>>>>>

Re: Fun with WebAssembly transforms

Posted by Brian Hulette <bh...@google.com>.
Re: Arrow - it's long been my dream to use Arrow for interchange in Beam
[1]. I'm trying to move us in that direction with
https://s.apache.org/batched-dofns (arrow is discussed briefly in the
Future Work section). This gives the Python SDK a concept of batches of
logical elements. My goal is Beam schemas + batches of logical elements ->
Arrow RecordBatches.

The Batched DoFn infrastructure is stable as of the 2.40.0 release cut and
I'm currently working on adding what I'm calling a "BatchConverter" [2] for
Beam Rows -> Arrow RecordBatch. Once that's done it could be interesting to
experiment with a "WasmDoFn" that uses Arrow for interchange.

Brian

[1]
https://docs.google.com/presentation/d/1D9vigwYTCuAuz_CO8nex3GK3h873acmQJE5Ui8TFsDY/edit#slide=id.g608e662464_0_160
[2]
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/typehints/batch.py


On Thu, Jun 16, 2022 at 10:55 AM Sean Jensen-Grey <je...@google.com>
wrote:

> Interesting.
>
> Robert, I was just served an ad for Redpanda when I searched for "golang
> wasm" :)
>
> The storage and execution grid systems are all embracing wasm in some way.
>
> https://redpanda.com/
> https://www.fluvio.io/
> https://temporal.io/ (Cadence fork by the Cadence folks, I met Maxim the
> lead at Temporal at the 2020 Wasm Summit)
> https://github.com/pachyderm/pachyderm no mention of wasm, yet.
>
> Keep the Wasm+Beam demos coming.
>
> Sean
>
>
>
> On Thu, Jun 16, 2022 at 4:23 AM Steven van Rossum <sj...@google.com>
> wrote:
>
>> I caught up with all the replies through the web interface, but I didn't
>> have my list subscription set up correctly so my reply (TL;DR sample code
>> available at https://github.com/sjvanrossum/beam-wasm) didn't come
>> through until a bit later yesterday I think.
>>
>> Sean, I agree with your suggestion of Arrow as the interchange format for
>> Wasm transforms and it's something I thought about exploring when I was
>> adding serialization/deserialization of complex (meaning anything that's
>> not an integer or float in the context of Wasm) data types in the demo.
>> It's an unfortunate bit of overhead which could very well be solved with
>> Arrow and shared memory between Wasm modules.
>> I've seen Wasm transforms pop up in a few other places, notably in
>> streaming data platforms like Fluvio and Redpanda and they seem to incur
>> the same overhead when moving data into and out of the guest context so
>> maybe it's negligible, but I haven't done any serious benchmark yet to
>> validate that.
>>
>> Regards,
>>
>> Steve
>>
>> On Thu, Jun 16, 2022 at 3:04 AM Robert Burke <ro...@frantil.com> wrote:
>>
>>> Obligatory mention that WASM is basically an architecture that any well
>>> meaning compiler can target, eg the Go compiler
>>>
>>>
>>> https://www.bradcypert.com/an-introduction-to-targeting-web-assembly-with-golang/
>>>
>>> (Among many articles for the last few years)
>>>
>>> Robert Burke
>>> Beam Go Busybody
>>>
>>> On Wed, Jun 15, 2022, 2:04 PM Sean Jensen-Grey <je...@google.com>
>>> wrote:
>>>
>>>> Heh, my stage fright was so strong, I didn't realize that the talk was
>>>> recorded. :)
>>>>
>>>> Steven, I'd love to chat about Wasm in Beam. This email is a bit rough.
>>>>
>>>> I haven't explored Wasm in Beam much since that talk. I think the most
>>>> compelling use is in the portability of logic between data processing
>>>> systems. Esp in the use of probabilistic data structures like Bloom
>>>> Filters, Count-Min-Sketch, HyperLogLog, where it is nice to persist the
>>>> data structure and use it on a different system. Like generating a bloom
>>>> filter in Beam and using it inside of a BQ query w/o having to reimplement
>>>> and test across many platforms.
>>>>
>>>> I have used Wasm in BQ, as BQ UDFs are driven by V8. Anywhere V8
>>>> exists, Wasm support exists for free unless the embedder goes out of their
>>>> way to disable it. So it is supported in Deno/Node as well. In Python, Wasm
>>>> support via Wasmtime <https://github.com/bytecodealliance/wasmtime> is
>>>> really good.  There are *many* options for execution environments, one of
>>>> the downsides of passing through JS one is in string and number
>>>> support(float/int64) issues, afaik. I could be wrong, maybe JS has fixed
>>>> all this by now.
>>>>
>>>> The qualities in order of importance (for me) are
>>>>
>>>>    1. Portability, run the same code everywhere
>>>>    2. Security, memory safety for the caller. Running Wasm inside of
>>>>    Python should never crash your Python interpreter. The capability model
>>>>    ensures that the Wasm module can only do what you allow it to
>>>>    3. Performance (portable), compile once and run everywhere within
>>>>    some margin of native.  Python makes this look good :)
>>>>
>>>> I think something worth exploring is moving opaque-ish Arrow objects
>>>> around via Beam, so that Beam is now mostly in the control plane and
>>>> computation happens in Wasm, this should reduce the serialization overhead
>>>> and also get Python out of the datapath.
>>>>
>>>> I see someone exploring Wasm+Arrow here,
>>>> https://github.com/domoritz/arrow-wasm
>>>>
>>>> Another possibly interesting avenue to explore is compiling command
>>>> line programs to Wasi (WebAssembly System Interface), the POSIX like shim,
>>>> so that they can be run inprocess without the fork/exec/pipe overhead of
>>>> running a subprocess. A neat demo might be running something like Jq
>>>> <https://stedolan.github.io/jq/> inside of a Beam job.
>>>>
>>>> Not to make Wasm sound like a Python only technology, it can be used
>>>> via Java/JVM via
>>>>
>>>>    - https://www.graalvm.org/22.1/reference-manual/wasm/
>>>>    - https://github.com/kawamuray/wasmtime-java
>>>>
>>>> Sean
>>>>
>>>>
>>>>
>>>> On Wed, Jun 15, 2022 at 9:35 AM Pablo Estrada <pa...@google.com>
>>>> wrote:
>>>>
>>>>> adding Steven in case he didn't get the replies : )
>>>>>
>>>>> On Wed, Jun 15, 2022 at 9:29 AM Daniel Collins <dp...@google.com>
>>>>> wrote:
>>>>>
>>>>>> If we ever do anything with the JS runtime, this would seem to be the
>>>>>> best place to run WASM.
>>>>>>
>>>>>> On Tue, Jun 14, 2022 at 8:13 PM Brian Hulette <bh...@google.com>
>>>>>> wrote:
>>>>>>
>>>>>>> FYI: @Sean Jensen-Grey <je...@google.com> gave a talk back in
>>>>>>> 2020 where he had integrated Rust with the Python SDK. I thought he used
>>>>>>> WebAssembly for that, but it looks like he used some other approaches, and
>>>>>>> his talk mentioned WebAssembly as future work. Not sure if that was ever
>>>>>>> explored.
>>>>>>>
>>>>>>> https://www.youtube.com/watch?v=fZK_Tiu7q1o
>>>>>>> https://github.com/seanjensengrey/beam-rust-python-java
>>>>>>>
>>>>>>> Brian
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Jun 14, 2022 at 5:05 PM Ahmet Altay <al...@google.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Adding @Lukasz Cwik <lc...@google.com> - he was interested in the
>>>>>>>> WebAssembly topic.
>>>>>>>>
>>>>>>>> On Tue, Jun 14, 2022 at 3:09 PM Pablo Estrada <pa...@google.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Would you open a pull request for it? Or at least share a branch?
>>>>>>>>> : )
>>>>>>>>> Even if we don't want to merge it, it would be great to have a PR
>>>>>>>>> as a way to showcase the work, its usefulness, and receive comments on this
>>>>>>>>> thread once we can see something more specific.
>>>>>>>>>
>>>>>>>>> On Tue, Jun 14, 2022 at 3:05 PM Steven van Rossum <
>>>>>>>>> sjvanrossum@google.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hi folks,
>>>>>>>>>>
>>>>>>>>>> I had some spare time yesterday and thought it'd be fun to
>>>>>>>>>> implement a transform which runs WebAssembly modules as a lightweight way
>>>>>>>>>> to implement cross language transforms for languages which don't (yet) have
>>>>>>>>>> a SDK implementation.
>>>>>>>>>>
>>>>>>>>>> I've got a small proof of concept running in the Python SDK as a
>>>>>>>>>> DoFn with Wasmer as the WebAssembly runtime and simple support for
>>>>>>>>>> marshalling between the host and guest environment with the RowCoder. The
>>>>>>>>>> module I've constructed is mostly useless, but demonstrates the host
>>>>>>>>>> copying the encoded element into the guest's memory, the guest copying
>>>>>>>>>> those bytes elsewhere in its linear memory buffer, the guest calling back
>>>>>>>>>> to the host with the offset and size and the host copying and decoding from
>>>>>>>>>> the guest's memory.
>>>>>>>>>>
>>>>>>>>>> Any thoughts/interest? I'm not sure where I was going with this,
>>>>>>>>>> since it was mostly just a "wouldn't it be cool if..." on a Monday
>>>>>>>>>> afternoon, but I can see a few use cases for this.
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>>
>>>>>>>>>> Steve
>>>>>>>>>>
>>>>>>>>>> Steven van Rossum |  Strategic Cloud Engineer |
>>>>>>>>>> sjvanrossum@google.com |  (+31) (0)6 21174069
>>>>>>>>>> <+31%206%2021174069>
>>>>>>>>>>
>>>>>>>>>> *Google Netherlands B.V.*
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> *Reg: Claude Debussylaan 34 15th floor, 1082 MD
>>>>>>>>>> Amsterdam34198589NETHERLANDSVAT / Tax ID:- 812788515 B01*
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> *If you received this communication by mistake, please don't
>>>>>>>>>> forward it to anyone else (it may contain confidential or privileged
>>>>>>>>>> information), please erase all copies of it, including all attachments, and
>>>>>>>>>> please let the sender know it went to the wrong person. Thanks.*
>>>>>>>>>>
>>>>>>>>>> *The above terms reflect a potential business arrangement, are
>>>>>>>>>> provided solely as a basis for further discussion, and are not intended to
>>>>>>>>>> be and do not constitute a legally binding obligation. No legally binding
>>>>>>>>>> obligations will be created, implied, or inferred until an agreement in
>>>>>>>>>> final form is executed in writing by all parties involved.*
>>>>>>>>>>
>>>>>>>>>

Re: Fun with WebAssembly transforms

Posted by Sean Jensen-Grey <je...@google.com>.
Interesting.

Robert, I was just served an ad for Redpanda when I searched for "golang
wasm" :)

The storage and execution grid systems are all embracing wasm in some way.

https://redpanda.com/
https://www.fluvio.io/
https://temporal.io/ (Cadence fork by the Cadence folks, I met Maxim the
lead at Temporal at the 2020 Wasm Summit)
https://github.com/pachyderm/pachyderm no mention of wasm, yet.

Keep the Wasm+Beam demos coming.

Sean



On Thu, Jun 16, 2022 at 4:23 AM Steven van Rossum <sj...@google.com>
wrote:

> I caught up with all the replies through the web interface, but I didn't
> have my list subscription set up correctly so my reply (TL;DR sample code
> available at https://github.com/sjvanrossum/beam-wasm) didn't come
> through until a bit later yesterday I think.
>
> Sean, I agree with your suggestion of Arrow as the interchange format for
> Wasm transforms and it's something I thought about exploring when I was
> adding serialization/deserialization of complex (meaning anything that's
> not an integer or float in the context of Wasm) data types in the demo.
> It's an unfortunate bit of overhead which could very well be solved with
> Arrow and shared memory between Wasm modules.
> I've seen Wasm transforms pop up in a few other places, notably in
> streaming data platforms like Fluvio and Redpanda and they seem to incur
> the same overhead when moving data into and out of the guest context so
> maybe it's negligible, but I haven't done any serious benchmark yet to
> validate that.
>
> Regards,
>
> Steve
>
> On Thu, Jun 16, 2022 at 3:04 AM Robert Burke <ro...@frantil.com> wrote:
>
>> Obligatory mention that WASM is basically an architecture that any well
>> meaning compiler can target, eg the Go compiler
>>
>>
>> https://www.bradcypert.com/an-introduction-to-targeting-web-assembly-with-golang/
>>
>> (Among many articles for the last few years)
>>
>> Robert Burke
>> Beam Go Busybody
>>
>> On Wed, Jun 15, 2022, 2:04 PM Sean Jensen-Grey <je...@google.com>
>> wrote:
>>
>>> Heh, my stage fright was so strong, I didn't realize that the talk was
>>> recorded. :)
>>>
>>> Steven, I'd love to chat about Wasm in Beam. This email is a bit rough.
>>>
>>> I haven't explored Wasm in Beam much since that talk. I think the most
>>> compelling use is in the portability of logic between data processing
>>> systems. Esp in the use of probabilistic data structures like Bloom
>>> Filters, Count-Min-Sketch, HyperLogLog, where it is nice to persist the
>>> data structure and use it on a different system. Like generating a bloom
>>> filter in Beam and using it inside of a BQ query w/o having to reimplement
>>> and test across many platforms.
>>>
>>> I have used Wasm in BQ, as BQ UDFs are driven by V8. Anywhere V8 exists,
>>> Wasm support exists for free unless the embedder goes out of their way to
>>> disable it. So it is supported in Deno/Node as well. In Python, Wasm
>>> support via Wasmtime <https://github.com/bytecodealliance/wasmtime> is
>>> really good.  There are *many* options for execution environments, one of
>>> the downsides of passing through JS one is in string and number
>>> support(float/int64) issues, afaik. I could be wrong, maybe JS has fixed
>>> all this by now.
>>>
>>> The qualities in order of importance (for me) are
>>>
>>>    1. Portability, run the same code everywhere
>>>    2. Security, memory safety for the caller. Running Wasm inside of
>>>    Python should never crash your Python interpreter. The capability model
>>>    ensures that the Wasm module can only do what you allow it to
>>>    3. Performance (portable), compile once and run everywhere within
>>>    some margin of native.  Python makes this look good :)
>>>
>>> I think something worth exploring is moving opaque-ish Arrow objects
>>> around via Beam, so that Beam is now mostly in the control plane and
>>> computation happens in Wasm, this should reduce the serialization overhead
>>> and also get Python out of the datapath.
>>>
>>> I see someone exploring Wasm+Arrow here,
>>> https://github.com/domoritz/arrow-wasm
>>>
>>> Another possibly interesting avenue to explore is compiling command line
>>> programs to Wasi (WebAssembly System Interface), the POSIX like shim, so
>>> that they can be run inprocess without the fork/exec/pipe overhead of
>>> running a subprocess. A neat demo might be running something like Jq
>>> <https://stedolan.github.io/jq/> inside of a Beam job.
>>>
>>> Not to make Wasm sound like a Python only technology, it can be used via
>>> Java/JVM via
>>>
>>>    - https://www.graalvm.org/22.1/reference-manual/wasm/
>>>    - https://github.com/kawamuray/wasmtime-java
>>>
>>> Sean
>>>
>>>
>>>
>>> On Wed, Jun 15, 2022 at 9:35 AM Pablo Estrada <pa...@google.com>
>>> wrote:
>>>
>>>> adding Steven in case he didn't get the replies : )
>>>>
>>>> On Wed, Jun 15, 2022 at 9:29 AM Daniel Collins <dp...@google.com>
>>>> wrote:
>>>>
>>>>> If we ever do anything with the JS runtime, this would seem to be the
>>>>> best place to run WASM.
>>>>>
>>>>> On Tue, Jun 14, 2022 at 8:13 PM Brian Hulette <bh...@google.com>
>>>>> wrote:
>>>>>
>>>>>> FYI: @Sean Jensen-Grey <je...@google.com> gave a talk back in
>>>>>> 2020 where he had integrated Rust with the Python SDK. I thought he used
>>>>>> WebAssembly for that, but it looks like he used some other approaches, and
>>>>>> his talk mentioned WebAssembly as future work. Not sure if that was ever
>>>>>> explored.
>>>>>>
>>>>>> https://www.youtube.com/watch?v=fZK_Tiu7q1o
>>>>>> https://github.com/seanjensengrey/beam-rust-python-java
>>>>>>
>>>>>> Brian
>>>>>>
>>>>>>
>>>>>> On Tue, Jun 14, 2022 at 5:05 PM Ahmet Altay <al...@google.com> wrote:
>>>>>>
>>>>>>> Adding @Lukasz Cwik <lc...@google.com> - he was interested in the
>>>>>>> WebAssembly topic.
>>>>>>>
>>>>>>> On Tue, Jun 14, 2022 at 3:09 PM Pablo Estrada <pa...@google.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Would you open a pull request for it? Or at least share a branch? :
>>>>>>>> )
>>>>>>>> Even if we don't want to merge it, it would be great to have a PR
>>>>>>>> as a way to showcase the work, its usefulness, and receive comments on this
>>>>>>>> thread once we can see something more specific.
>>>>>>>>
>>>>>>>> On Tue, Jun 14, 2022 at 3:05 PM Steven van Rossum <
>>>>>>>> sjvanrossum@google.com> wrote:
>>>>>>>>
>>>>>>>>> Hi folks,
>>>>>>>>>
>>>>>>>>> I had some spare time yesterday and thought it'd be fun to
>>>>>>>>> implement a transform which runs WebAssembly modules as a lightweight way
>>>>>>>>> to implement cross language transforms for languages which don't (yet) have
>>>>>>>>> a SDK implementation.
>>>>>>>>>
>>>>>>>>> I've got a small proof of concept running in the Python SDK as a
>>>>>>>>> DoFn with Wasmer as the WebAssembly runtime and simple support for
>>>>>>>>> marshalling between the host and guest environment with the RowCoder. The
>>>>>>>>> module I've constructed is mostly useless, but demonstrates the host
>>>>>>>>> copying the encoded element into the guest's memory, the guest copying
>>>>>>>>> those bytes elsewhere in its linear memory buffer, the guest calling back
>>>>>>>>> to the host with the offset and size and the host copying and decoding from
>>>>>>>>> the guest's memory.
>>>>>>>>>
>>>>>>>>> Any thoughts/interest? I'm not sure where I was going with this,
>>>>>>>>> since it was mostly just a "wouldn't it be cool if..." on a Monday
>>>>>>>>> afternoon, but I can see a few use cases for this.
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>>
>>>>>>>>> Steve
>>>>>>>>>
>>>>>>>>> Steven van Rossum |  Strategic Cloud Engineer |
>>>>>>>>> sjvanrossum@google.com |  (+31) (0)6 21174069 <+31%206%2021174069>
>>>>>>>>>
>>>>>>>>> *Google Netherlands B.V.*
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> *Reg: Claude Debussylaan 34 15th floor, 1082 MD
>>>>>>>>> Amsterdam34198589NETHERLANDSVAT / Tax ID:- 812788515 B01*
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> *If you received this communication by mistake, please don't
>>>>>>>>> forward it to anyone else (it may contain confidential or privileged
>>>>>>>>> information), please erase all copies of it, including all attachments, and
>>>>>>>>> please let the sender know it went to the wrong person. Thanks.*
>>>>>>>>>
>>>>>>>>> *The above terms reflect a potential business arrangement, are
>>>>>>>>> provided solely as a basis for further discussion, and are not intended to
>>>>>>>>> be and do not constitute a legally binding obligation. No legally binding
>>>>>>>>> obligations will be created, implied, or inferred until an agreement in
>>>>>>>>> final form is executed in writing by all parties involved.*
>>>>>>>>>
>>>>>>>>

Re: Fun with WebAssembly transforms

Posted by Steven van Rossum <sj...@google.com>.
I caught up with all the replies through the web interface, but I didn't
have my list subscription set up correctly so my reply (TL;DR sample code
available at https://github.com/sjvanrossum/beam-wasm) didn't come through
until a bit later yesterday I think.

Sean, I agree with your suggestion of Arrow as the interchange format for
Wasm transforms and it's something I thought about exploring when I was
adding serialization/deserialization of complex (meaning anything that's
not an integer or float in the context of Wasm) data types in the demo.
It's an unfortunate bit of overhead which could very well be solved with
Arrow and shared memory between Wasm modules.
I've seen Wasm transforms pop up in a few other places, notably in
streaming data platforms like Fluvio and Redpanda and they seem to incur
the same overhead when moving data into and out of the guest context so
maybe it's negligible, but I haven't done any serious benchmark yet to
validate that.

Regards,

Steve

On Thu, Jun 16, 2022 at 3:04 AM Robert Burke <ro...@frantil.com> wrote:

> Obligatory mention that WASM is basically an architecture that any well
> meaning compiler can target, eg the Go compiler
>
>
> https://www.bradcypert.com/an-introduction-to-targeting-web-assembly-with-golang/
>
> (Among many articles for the last few years)
>
> Robert Burke
> Beam Go Busybody
>
> On Wed, Jun 15, 2022, 2:04 PM Sean Jensen-Grey <je...@google.com>
> wrote:
>
>> Heh, my stage fright was so strong, I didn't realize that the talk was
>> recorded. :)
>>
>> Steven, I'd love to chat about Wasm in Beam. This email is a bit rough.
>>
>> I haven't explored Wasm in Beam much since that talk. I think the most
>> compelling use is in the portability of logic between data processing
>> systems. Esp in the use of probabilistic data structures like Bloom
>> Filters, Count-Min-Sketch, HyperLogLog, where it is nice to persist the
>> data structure and use it on a different system. Like generating a bloom
>> filter in Beam and using it inside of a BQ query w/o having to reimplement
>> and test across many platforms.
>>
>> I have used Wasm in BQ, as BQ UDFs are driven by V8. Anywhere V8 exists,
>> Wasm support exists for free unless the embedder goes out of their way to
>> disable it. So it is supported in Deno/Node as well. In Python, Wasm
>> support via Wasmtime <https://github.com/bytecodealliance/wasmtime> is
>> really good.  There are *many* options for execution environments, one of
>> the downsides of passing through JS one is in string and number
>> support(float/int64) issues, afaik. I could be wrong, maybe JS has fixed
>> all this by now.
>>
>> The qualities in order of importance (for me) are
>>
>>    1. Portability, run the same code everywhere
>>    2. Security, memory safety for the caller. Running Wasm inside of
>>    Python should never crash your Python interpreter. The capability model
>>    ensures that the Wasm module can only do what you allow it to
>>    3. Performance (portable), compile once and run everywhere within
>>    some margin of native.  Python makes this look good :)
>>
>> I think something worth exploring is moving opaque-ish Arrow objects
>> around via Beam, so that Beam is now mostly in the control plane and
>> computation happens in Wasm, this should reduce the serialization overhead
>> and also get Python out of the datapath.
>>
>> I see someone exploring Wasm+Arrow here,
>> https://github.com/domoritz/arrow-wasm
>>
>> Another possibly interesting avenue to explore is compiling command line
>> programs to Wasi (WebAssembly System Interface), the POSIX like shim, so
>> that they can be run inprocess without the fork/exec/pipe overhead of
>> running a subprocess. A neat demo might be running something like Jq
>> <https://stedolan.github.io/jq/> inside of a Beam job.
>>
>> Not to make Wasm sound like a Python only technology, it can be used via
>> Java/JVM via
>>
>>    - https://www.graalvm.org/22.1/reference-manual/wasm/
>>    - https://github.com/kawamuray/wasmtime-java
>>
>> Sean
>>
>>
>>
>> On Wed, Jun 15, 2022 at 9:35 AM Pablo Estrada <pa...@google.com> wrote:
>>
>>> adding Steven in case he didn't get the replies : )
>>>
>>> On Wed, Jun 15, 2022 at 9:29 AM Daniel Collins <dp...@google.com>
>>> wrote:
>>>
>>>> If we ever do anything with the JS runtime, this would seem to be the
>>>> best place to run WASM.
>>>>
>>>> On Tue, Jun 14, 2022 at 8:13 PM Brian Hulette <bh...@google.com>
>>>> wrote:
>>>>
>>>>> FYI: @Sean Jensen-Grey <je...@google.com> gave a talk back in
>>>>> 2020 where he had integrated Rust with the Python SDK. I thought he used
>>>>> WebAssembly for that, but it looks like he used some other approaches, and
>>>>> his talk mentioned WebAssembly as future work. Not sure if that was ever
>>>>> explored.
>>>>>
>>>>> https://www.youtube.com/watch?v=fZK_Tiu7q1o
>>>>> https://github.com/seanjensengrey/beam-rust-python-java
>>>>>
>>>>> Brian
>>>>>
>>>>>
>>>>> On Tue, Jun 14, 2022 at 5:05 PM Ahmet Altay <al...@google.com> wrote:
>>>>>
>>>>>> Adding @Lukasz Cwik <lc...@google.com> - he was interested in the
>>>>>> WebAssembly topic.
>>>>>>
>>>>>> On Tue, Jun 14, 2022 at 3:09 PM Pablo Estrada <pa...@google.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Would you open a pull request for it? Or at least share a branch? : )
>>>>>>> Even if we don't want to merge it, it would be great to have a PR as
>>>>>>> a way to showcase the work, its usefulness, and receive comments on this
>>>>>>> thread once we can see something more specific.
>>>>>>>
>>>>>>> On Tue, Jun 14, 2022 at 3:05 PM Steven van Rossum <
>>>>>>> sjvanrossum@google.com> wrote:
>>>>>>>
>>>>>>>> Hi folks,
>>>>>>>>
>>>>>>>> I had some spare time yesterday and thought it'd be fun to
>>>>>>>> implement a transform which runs WebAssembly modules as a lightweight way
>>>>>>>> to implement cross language transforms for languages which don't (yet) have
>>>>>>>> a SDK implementation.
>>>>>>>>
>>>>>>>> I've got a small proof of concept running in the Python SDK as a
>>>>>>>> DoFn with Wasmer as the WebAssembly runtime and simple support for
>>>>>>>> marshalling between the host and guest environment with the RowCoder. The
>>>>>>>> module I've constructed is mostly useless, but demonstrates the host
>>>>>>>> copying the encoded element into the guest's memory, the guest copying
>>>>>>>> those bytes elsewhere in its linear memory buffer, the guest calling back
>>>>>>>> to the host with the offset and size and the host copying and decoding from
>>>>>>>> the guest's memory.
>>>>>>>>
>>>>>>>> Any thoughts/interest? I'm not sure where I was going with this,
>>>>>>>> since it was mostly just a "wouldn't it be cool if..." on a Monday
>>>>>>>> afternoon, but I can see a few use cases for this.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>>
>>>>>>>> Steve
>>>>>>>>
>>>>>>>> Steven van Rossum |  Strategic Cloud Engineer |
>>>>>>>> sjvanrossum@google.com |  (+31) (0)6 21174069 <+31%206%2021174069>
>>>>>>>>
>>>>>>>> *Google Netherlands B.V.*
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> *Reg: Claude Debussylaan 34 15th floor, 1082 MD
>>>>>>>> Amsterdam34198589NETHERLANDSVAT / Tax ID:- 812788515 B01*
>>>>>>>>
>>>>>>>>
>>>>>>>> *If you received this communication by mistake, please don't
>>>>>>>> forward it to anyone else (it may contain confidential or privileged
>>>>>>>> information), please erase all copies of it, including all attachments, and
>>>>>>>> please let the sender know it went to the wrong person. Thanks.*
>>>>>>>>
>>>>>>>> *The above terms reflect a potential business arrangement, are
>>>>>>>> provided solely as a basis for further discussion, and are not intended to
>>>>>>>> be and do not constitute a legally binding obligation. No legally binding
>>>>>>>> obligations will be created, implied, or inferred until an agreement in
>>>>>>>> final form is executed in writing by all parties involved.*
>>>>>>>>
>>>>>>>

Re: Fun with WebAssembly transforms

Posted by Robert Burke <ro...@frantil.com>.
Obligatory mention that WASM is basically an architecture that any well
meaning compiler can target, eg the Go compiler

https://www.bradcypert.com/an-introduction-to-targeting-web-assembly-with-golang/

(Among many articles for the last few years)

Robert Burke
Beam Go Busybody

On Wed, Jun 15, 2022, 2:04 PM Sean Jensen-Grey <je...@google.com>
wrote:

> Heh, my stage fright was so strong, I didn't realize that the talk was
> recorded. :)
>
> Steven, I'd love to chat about Wasm in Beam. This email is a bit rough.
>
> I haven't explored Wasm in Beam much since that talk. I think the most
> compelling use is in the portability of logic between data processing
> systems. Esp in the use of probabilistic data structures like Bloom
> Filters, Count-Min-Sketch, HyperLogLog, where it is nice to persist the
> data structure and use it on a different system. Like generating a bloom
> filter in Beam and using it inside of a BQ query w/o having to reimplement
> and test across many platforms.
>
> I have used Wasm in BQ, as BQ UDFs are driven by V8. Anywhere V8 exists,
> Wasm support exists for free unless the embedder goes out of their way to
> disable it. So it is supported in Deno/Node as well. In Python, Wasm
> support via Wasmtime <https://github.com/bytecodealliance/wasmtime> is
> really good.  There are *many* options for execution environments, one of
> the downsides of passing through JS one is in string and number
> support(float/int64) issues, afaik. I could be wrong, maybe JS has fixed
> all this by now.
>
> The qualities in order of importance (for me) are
>
>    1. Portability, run the same code everywhere
>    2. Security, memory safety for the caller. Running Wasm inside of
>    Python should never crash your Python interpreter. The capability model
>    ensures that the Wasm module can only do what you allow it to
>    3. Performance (portable), compile once and run everywhere within some
>    margin of native.  Python makes this look good :)
>
> I think something worth exploring is moving opaque-ish Arrow objects
> around via Beam, so that Beam is now mostly in the control plane and
> computation happens in Wasm, this should reduce the serialization overhead
> and also get Python out of the datapath.
>
> I see someone exploring Wasm+Arrow here,
> https://github.com/domoritz/arrow-wasm
>
> Another possibly interesting avenue to explore is compiling command line
> programs to Wasi (WebAssembly System Interface), the POSIX like shim, so
> that they can be run inprocess without the fork/exec/pipe overhead of
> running a subprocess. A neat demo might be running something like Jq
> <https://stedolan.github.io/jq/> inside of a Beam job.
>
> Not to make Wasm sound like a Python only technology, it can be used via
> Java/JVM via
>
>    - https://www.graalvm.org/22.1/reference-manual/wasm/
>    - https://github.com/kawamuray/wasmtime-java
>
> Sean
>
>
>
> On Wed, Jun 15, 2022 at 9:35 AM Pablo Estrada <pa...@google.com> wrote:
>
>> adding Steven in case he didn't get the replies : )
>>
>> On Wed, Jun 15, 2022 at 9:29 AM Daniel Collins <dp...@google.com>
>> wrote:
>>
>>> If we ever do anything with the JS runtime, this would seem to be the
>>> best place to run WASM.
>>>
>>> On Tue, Jun 14, 2022 at 8:13 PM Brian Hulette <bh...@google.com>
>>> wrote:
>>>
>>>> FYI: @Sean Jensen-Grey <je...@google.com> gave a talk back in
>>>> 2020 where he had integrated Rust with the Python SDK. I thought he used
>>>> WebAssembly for that, but it looks like he used some other approaches, and
>>>> his talk mentioned WebAssembly as future work. Not sure if that was ever
>>>> explored.
>>>>
>>>> https://www.youtube.com/watch?v=fZK_Tiu7q1o
>>>> https://github.com/seanjensengrey/beam-rust-python-java
>>>>
>>>> Brian
>>>>
>>>>
>>>> On Tue, Jun 14, 2022 at 5:05 PM Ahmet Altay <al...@google.com> wrote:
>>>>
>>>>> Adding @Lukasz Cwik <lc...@google.com> - he was interested in the
>>>>> WebAssembly topic.
>>>>>
>>>>> On Tue, Jun 14, 2022 at 3:09 PM Pablo Estrada <pa...@google.com>
>>>>> wrote:
>>>>>
>>>>>> Would you open a pull request for it? Or at least share a branch? : )
>>>>>> Even if we don't want to merge it, it would be great to have a PR as
>>>>>> a way to showcase the work, its usefulness, and receive comments on this
>>>>>> thread once we can see something more specific.
>>>>>>
>>>>>> On Tue, Jun 14, 2022 at 3:05 PM Steven van Rossum <
>>>>>> sjvanrossum@google.com> wrote:
>>>>>>
>>>>>>> Hi folks,
>>>>>>>
>>>>>>> I had some spare time yesterday and thought it'd be fun to implement
>>>>>>> a transform which runs WebAssembly modules as a lightweight way to
>>>>>>> implement cross language transforms for languages which don't (yet) have a
>>>>>>> SDK implementation.
>>>>>>>
>>>>>>> I've got a small proof of concept running in the Python SDK as a
>>>>>>> DoFn with Wasmer as the WebAssembly runtime and simple support for
>>>>>>> marshalling between the host and guest environment with the RowCoder. The
>>>>>>> module I've constructed is mostly useless, but demonstrates the host
>>>>>>> copying the encoded element into the guest's memory, the guest copying
>>>>>>> those bytes elsewhere in its linear memory buffer, the guest calling back
>>>>>>> to the host with the offset and size and the host copying and decoding from
>>>>>>> the guest's memory.
>>>>>>>
>>>>>>> Any thoughts/interest? I'm not sure where I was going with this,
>>>>>>> since it was mostly just a "wouldn't it be cool if..." on a Monday
>>>>>>> afternoon, but I can see a few use cases for this.
>>>>>>>
>>>>>>> Regards,
>>>>>>>
>>>>>>> Steve
>>>>>>>
>>>>>>> Steven van Rossum |  Strategic Cloud Engineer |
>>>>>>> sjvanrossum@google.com |  (+31) (0)6 21174069 <+31%206%2021174069>
>>>>>>>
>>>>>>> *Google Netherlands B.V.*
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> *Reg: Claude Debussylaan 34 15th floor, 1082 MD
>>>>>>> Amsterdam34198589NETHERLANDSVAT / Tax ID:- 812788515 B01*
>>>>>>>
>>>>>>>
>>>>>>> *If you received this communication by mistake, please don't forward
>>>>>>> it to anyone else (it may contain confidential or privileged information),
>>>>>>> please erase all copies of it, including all attachments, and please let
>>>>>>> the sender know it went to the wrong person. Thanks.*
>>>>>>>
>>>>>>> *The above terms reflect a potential business arrangement, are
>>>>>>> provided solely as a basis for further discussion, and are not intended to
>>>>>>> be and do not constitute a legally binding obligation. No legally binding
>>>>>>> obligations will be created, implied, or inferred until an agreement in
>>>>>>> final form is executed in writing by all parties involved.*
>>>>>>>
>>>>>>

Re: Fun with WebAssembly transforms

Posted by Sean Jensen-Grey <je...@google.com>.
Heh, my stage fright was so strong, I didn't realize that the talk was
recorded. :)

Steven, I'd love to chat about Wasm in Beam. This email is a bit rough.

I haven't explored Wasm in Beam much since that talk. I think the most
compelling use is in the portability of logic between data processing
systems. Esp in the use of probabilistic data structures like Bloom
Filters, Count-Min-Sketch, HyperLogLog, where it is nice to persist the
data structure and use it on a different system. Like generating a bloom
filter in Beam and using it inside of a BQ query w/o having to reimplement
and test across many platforms.

I have used Wasm in BQ, as BQ UDFs are driven by V8. Anywhere V8 exists,
Wasm support exists for free unless the embedder goes out of their way to
disable it. So it is supported in Deno/Node as well. In Python, Wasm
support via Wasmtime <https://github.com/bytecodealliance/wasmtime> is
really good.  There are *many* options for execution environments, one of
the downsides of passing through JS one is in string and number
support(float/int64) issues, afaik. I could be wrong, maybe JS has fixed
all this by now.

The qualities in order of importance (for me) are

   1. Portability, run the same code everywhere
   2. Security, memory safety for the caller. Running Wasm inside of Python
   should never crash your Python interpreter. The capability model ensures
   that the Wasm module can only do what you allow it to
   3. Performance (portable), compile once and run everywhere within some
   margin of native.  Python makes this look good :)

I think something worth exploring is moving opaque-ish Arrow objects around
via Beam, so that Beam is now mostly in the control plane and computation
happens in Wasm, this should reduce the serialization overhead and also get
Python out of the datapath.

I see someone exploring Wasm+Arrow here,
https://github.com/domoritz/arrow-wasm

Another possibly interesting avenue to explore is compiling command line
programs to Wasi (WebAssembly System Interface), the POSIX like shim, so
that they can be run inprocess without the fork/exec/pipe overhead of
running a subprocess. A neat demo might be running something like Jq
<https://stedolan.github.io/jq/> inside of a Beam job.

Not to make Wasm sound like a Python only technology, it can be used via
Java/JVM via

   - https://www.graalvm.org/22.1/reference-manual/wasm/
   - https://github.com/kawamuray/wasmtime-java

Sean



On Wed, Jun 15, 2022 at 9:35 AM Pablo Estrada <pa...@google.com> wrote:

> adding Steven in case he didn't get the replies : )
>
> On Wed, Jun 15, 2022 at 9:29 AM Daniel Collins <dp...@google.com>
> wrote:
>
>> If we ever do anything with the JS runtime, this would seem to be the
>> best place to run WASM.
>>
>> On Tue, Jun 14, 2022 at 8:13 PM Brian Hulette <bh...@google.com>
>> wrote:
>>
>>> FYI: @Sean Jensen-Grey <je...@google.com> gave a talk back in 2020
>>> where he had integrated Rust with the Python SDK. I thought he used
>>> WebAssembly for that, but it looks like he used some other approaches, and
>>> his talk mentioned WebAssembly as future work. Not sure if that was ever
>>> explored.
>>>
>>> https://www.youtube.com/watch?v=fZK_Tiu7q1o
>>> https://github.com/seanjensengrey/beam-rust-python-java
>>>
>>> Brian
>>>
>>>
>>> On Tue, Jun 14, 2022 at 5:05 PM Ahmet Altay <al...@google.com> wrote:
>>>
>>>> Adding @Lukasz Cwik <lc...@google.com> - he was interested in the
>>>> WebAssembly topic.
>>>>
>>>> On Tue, Jun 14, 2022 at 3:09 PM Pablo Estrada <pa...@google.com>
>>>> wrote:
>>>>
>>>>> Would you open a pull request for it? Or at least share a branch? : )
>>>>> Even if we don't want to merge it, it would be great to have a PR as a
>>>>> way to showcase the work, its usefulness, and receive comments on this
>>>>> thread once we can see something more specific.
>>>>>
>>>>> On Tue, Jun 14, 2022 at 3:05 PM Steven van Rossum <
>>>>> sjvanrossum@google.com> wrote:
>>>>>
>>>>>> Hi folks,
>>>>>>
>>>>>> I had some spare time yesterday and thought it'd be fun to implement
>>>>>> a transform which runs WebAssembly modules as a lightweight way to
>>>>>> implement cross language transforms for languages which don't (yet) have a
>>>>>> SDK implementation.
>>>>>>
>>>>>> I've got a small proof of concept running in the Python SDK as a DoFn
>>>>>> with Wasmer as the WebAssembly runtime and simple support for marshalling
>>>>>> between the host and guest environment with the RowCoder. The module I've
>>>>>> constructed is mostly useless, but demonstrates the host copying the
>>>>>> encoded element into the guest's memory, the guest copying those bytes
>>>>>> elsewhere in its linear memory buffer, the guest calling back to the host
>>>>>> with the offset and size and the host copying and decoding from the guest's
>>>>>> memory.
>>>>>>
>>>>>> Any thoughts/interest? I'm not sure where I was going with this,
>>>>>> since it was mostly just a "wouldn't it be cool if..." on a Monday
>>>>>> afternoon, but I can see a few use cases for this.
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> Steve
>>>>>>
>>>>>> Steven van Rossum |  Strategic Cloud Engineer |
>>>>>> sjvanrossum@google.com |  (+31) (0)6 21174069 <+31%206%2021174069>
>>>>>>
>>>>>> *Google Netherlands B.V.*
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> *Reg: Claude Debussylaan 34 15th floor, 1082 MD
>>>>>> Amsterdam34198589NETHERLANDSVAT / Tax ID:- 812788515 B01*
>>>>>>
>>>>>>
>>>>>> *If you received this communication by mistake, please don't forward
>>>>>> it to anyone else (it may contain confidential or privileged information),
>>>>>> please erase all copies of it, including all attachments, and please let
>>>>>> the sender know it went to the wrong person. Thanks.*
>>>>>>
>>>>>> *The above terms reflect a potential business arrangement, are
>>>>>> provided solely as a basis for further discussion, and are not intended to
>>>>>> be and do not constitute a legally binding obligation. No legally binding
>>>>>> obligations will be created, implied, or inferred until an agreement in
>>>>>> final form is executed in writing by all parties involved.*
>>>>>>
>>>>>

Re: Fun with WebAssembly transforms

Posted by Pablo Estrada <pa...@google.com>.
adding Steven in case he didn't get the replies : )

On Wed, Jun 15, 2022 at 9:29 AM Daniel Collins <dp...@google.com> wrote:

> If we ever do anything with the JS runtime, this would seem to be the best
> place to run WASM.
>
> On Tue, Jun 14, 2022 at 8:13 PM Brian Hulette <bh...@google.com> wrote:
>
>> FYI: @Sean Jensen-Grey <je...@google.com> gave a talk back in 2020
>> where he had integrated Rust with the Python SDK. I thought he used
>> WebAssembly for that, but it looks like he used some other approaches, and
>> his talk mentioned WebAssembly as future work. Not sure if that was ever
>> explored.
>>
>> https://www.youtube.com/watch?v=fZK_Tiu7q1o
>> https://github.com/seanjensengrey/beam-rust-python-java
>>
>> Brian
>>
>>
>> On Tue, Jun 14, 2022 at 5:05 PM Ahmet Altay <al...@google.com> wrote:
>>
>>> Adding @Lukasz Cwik <lc...@google.com> - he was interested in the
>>> WebAssembly topic.
>>>
>>> On Tue, Jun 14, 2022 at 3:09 PM Pablo Estrada <pa...@google.com>
>>> wrote:
>>>
>>>> Would you open a pull request for it? Or at least share a branch? : )
>>>> Even if we don't want to merge it, it would be great to have a PR as a
>>>> way to showcase the work, its usefulness, and receive comments on this
>>>> thread once we can see something more specific.
>>>>
>>>> On Tue, Jun 14, 2022 at 3:05 PM Steven van Rossum <
>>>> sjvanrossum@google.com> wrote:
>>>>
>>>>> Hi folks,
>>>>>
>>>>> I had some spare time yesterday and thought it'd be fun to implement a
>>>>> transform which runs WebAssembly modules as a lightweight way to implement
>>>>> cross language transforms for languages which don't (yet) have a SDK
>>>>> implementation.
>>>>>
>>>>> I've got a small proof of concept running in the Python SDK as a DoFn
>>>>> with Wasmer as the WebAssembly runtime and simple support for marshalling
>>>>> between the host and guest environment with the RowCoder. The module I've
>>>>> constructed is mostly useless, but demonstrates the host copying the
>>>>> encoded element into the guest's memory, the guest copying those bytes
>>>>> elsewhere in its linear memory buffer, the guest calling back to the host
>>>>> with the offset and size and the host copying and decoding from the guest's
>>>>> memory.
>>>>>
>>>>> Any thoughts/interest? I'm not sure where I was going with this, since
>>>>> it was mostly just a "wouldn't it be cool if..." on a Monday afternoon, but
>>>>> I can see a few use cases for this.
>>>>>
>>>>> Regards,
>>>>>
>>>>> Steve
>>>>>
>>>>> Steven van Rossum |  Strategic Cloud Engineer |
>>>>> sjvanrossum@google.com |  (+31) (0)6 21174069 <+31%206%2021174069>
>>>>>
>>>>> *Google Netherlands B.V.*
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> *Reg: Claude Debussylaan 34 15th floor, 1082 MD
>>>>> Amsterdam34198589NETHERLANDSVAT / Tax ID:- 812788515 B01*
>>>>>
>>>>>
>>>>> *If you received this communication by mistake, please don't forward
>>>>> it to anyone else (it may contain confidential or privileged information),
>>>>> please erase all copies of it, including all attachments, and please let
>>>>> the sender know it went to the wrong person. Thanks.*
>>>>>
>>>>> *The above terms reflect a potential business arrangement, are
>>>>> provided solely as a basis for further discussion, and are not intended to
>>>>> be and do not constitute a legally binding obligation. No legally binding
>>>>> obligations will be created, implied, or inferred until an agreement in
>>>>> final form is executed in writing by all parties involved.*
>>>>>
>>>>

Re: Fun with WebAssembly transforms

Posted by Daniel Collins <dp...@google.com>.
If we ever do anything with the JS runtime, this would seem to be the best
place to run WASM.

On Tue, Jun 14, 2022 at 8:13 PM Brian Hulette <bh...@google.com> wrote:

> FYI: @Sean Jensen-Grey <je...@google.com> gave a talk back in 2020
> where he had integrated Rust with the Python SDK. I thought he used
> WebAssembly for that, but it looks like he used some other approaches, and
> his talk mentioned WebAssembly as future work. Not sure if that was ever
> explored.
>
> https://www.youtube.com/watch?v=fZK_Tiu7q1o
> https://github.com/seanjensengrey/beam-rust-python-java
>
> Brian
>
>
> On Tue, Jun 14, 2022 at 5:05 PM Ahmet Altay <al...@google.com> wrote:
>
>> Adding @Lukasz Cwik <lc...@google.com> - he was interested in the
>> WebAssembly topic.
>>
>> On Tue, Jun 14, 2022 at 3:09 PM Pablo Estrada <pa...@google.com> wrote:
>>
>>> Would you open a pull request for it? Or at least share a branch? : )
>>> Even if we don't want to merge it, it would be great to have a PR as a
>>> way to showcase the work, its usefulness, and receive comments on this
>>> thread once we can see something more specific.
>>>
>>> On Tue, Jun 14, 2022 at 3:05 PM Steven van Rossum <
>>> sjvanrossum@google.com> wrote:
>>>
>>>> Hi folks,
>>>>
>>>> I had some spare time yesterday and thought it'd be fun to implement a
>>>> transform which runs WebAssembly modules as a lightweight way to implement
>>>> cross language transforms for languages which don't (yet) have a SDK
>>>> implementation.
>>>>
>>>> I've got a small proof of concept running in the Python SDK as a DoFn
>>>> with Wasmer as the WebAssembly runtime and simple support for marshalling
>>>> between the host and guest environment with the RowCoder. The module I've
>>>> constructed is mostly useless, but demonstrates the host copying the
>>>> encoded element into the guest's memory, the guest copying those bytes
>>>> elsewhere in its linear memory buffer, the guest calling back to the host
>>>> with the offset and size and the host copying and decoding from the guest's
>>>> memory.
>>>>
>>>> Any thoughts/interest? I'm not sure where I was going with this, since
>>>> it was mostly just a "wouldn't it be cool if..." on a Monday afternoon, but
>>>> I can see a few use cases for this.
>>>>
>>>> Regards,
>>>>
>>>> Steve
>>>>
>>>> Steven van Rossum |  Strategic Cloud Engineer |  sjvanrossum@google.com
>>>>  |  (+31) (0)6 21174069 <+31%206%2021174069>
>>>>
>>>> *Google Netherlands B.V.*
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> *Reg: Claude Debussylaan 34 15th floor, 1082 MD
>>>> Amsterdam34198589NETHERLANDSVAT / Tax ID:- 812788515 B01*
>>>>
>>>>
>>>> *If you received this communication by mistake, please don't forward it
>>>> to anyone else (it may contain confidential or privileged information),
>>>> please erase all copies of it, including all attachments, and please let
>>>> the sender know it went to the wrong person. Thanks.*
>>>>
>>>> *The above terms reflect a potential business arrangement, are provided
>>>> solely as a basis for further discussion, and are not intended to be and do
>>>> not constitute a legally binding obligation. No legally binding obligations
>>>> will be created, implied, or inferred until an agreement in final form is
>>>> executed in writing by all parties involved.*
>>>>
>>>

Re: Fun with WebAssembly transforms

Posted by Brian Hulette <bh...@google.com>.
FYI: @Sean Jensen-Grey <je...@google.com> gave a talk back in 2020
where he had integrated Rust with the Python SDK. I thought he used
WebAssembly for that, but it looks like he used some other approaches, and
his talk mentioned WebAssembly as future work. Not sure if that was ever
explored.

https://www.youtube.com/watch?v=fZK_Tiu7q1o
https://github.com/seanjensengrey/beam-rust-python-java

Brian


On Tue, Jun 14, 2022 at 5:05 PM Ahmet Altay <al...@google.com> wrote:

> Adding @Lukasz Cwik <lc...@google.com> - he was interested in the
> WebAssembly topic.
>
> On Tue, Jun 14, 2022 at 3:09 PM Pablo Estrada <pa...@google.com> wrote:
>
>> Would you open a pull request for it? Or at least share a branch? : )
>> Even if we don't want to merge it, it would be great to have a PR as a
>> way to showcase the work, its usefulness, and receive comments on this
>> thread once we can see something more specific.
>>
>> On Tue, Jun 14, 2022 at 3:05 PM Steven van Rossum <sj...@google.com>
>> wrote:
>>
>>> Hi folks,
>>>
>>> I had some spare time yesterday and thought it'd be fun to implement a
>>> transform which runs WebAssembly modules as a lightweight way to implement
>>> cross language transforms for languages which don't (yet) have a SDK
>>> implementation.
>>>
>>> I've got a small proof of concept running in the Python SDK as a DoFn
>>> with Wasmer as the WebAssembly runtime and simple support for marshalling
>>> between the host and guest environment with the RowCoder. The module I've
>>> constructed is mostly useless, but demonstrates the host copying the
>>> encoded element into the guest's memory, the guest copying those bytes
>>> elsewhere in its linear memory buffer, the guest calling back to the host
>>> with the offset and size and the host copying and decoding from the guest's
>>> memory.
>>>
>>> Any thoughts/interest? I'm not sure where I was going with this, since
>>> it was mostly just a "wouldn't it be cool if..." on a Monday afternoon, but
>>> I can see a few use cases for this.
>>>
>>> Regards,
>>>
>>> Steve
>>>
>>> Steven van Rossum |  Strategic Cloud Engineer |  sjvanrossum@google.com
>>>  |  (+31) (0)6 21174069 <+31%206%2021174069>
>>>
>>> *Google Netherlands B.V.*
>>>
>>>
>>>
>>>
>>>
>>> *Reg: Claude Debussylaan 34 15th floor, 1082 MD
>>> Amsterdam34198589NETHERLANDSVAT / Tax ID:- 812788515 B01*
>>>
>>>
>>> *If you received this communication by mistake, please don't forward it
>>> to anyone else (it may contain confidential or privileged information),
>>> please erase all copies of it, including all attachments, and please let
>>> the sender know it went to the wrong person. Thanks.*
>>>
>>> *The above terms reflect a potential business arrangement, are provided
>>> solely as a basis for further discussion, and are not intended to be and do
>>> not constitute a legally binding obligation. No legally binding obligations
>>> will be created, implied, or inferred until an agreement in final form is
>>> executed in writing by all parties involved.*
>>>
>>

Re: Fun with WebAssembly transforms

Posted by Ahmet Altay <al...@google.com>.
Adding @Lukasz Cwik <lc...@google.com> - he was interested in the
WebAssembly topic.

On Tue, Jun 14, 2022 at 3:09 PM Pablo Estrada <pa...@google.com> wrote:

> Would you open a pull request for it? Or at least share a branch? : )
> Even if we don't want to merge it, it would be great to have a PR as a way
> to showcase the work, its usefulness, and receive comments on this thread
> once we can see something more specific.
>
> On Tue, Jun 14, 2022 at 3:05 PM Steven van Rossum <sj...@google.com>
> wrote:
>
>> Hi folks,
>>
>> I had some spare time yesterday and thought it'd be fun to implement a
>> transform which runs WebAssembly modules as a lightweight way to implement
>> cross language transforms for languages which don't (yet) have a SDK
>> implementation.
>>
>> I've got a small proof of concept running in the Python SDK as a DoFn
>> with Wasmer as the WebAssembly runtime and simple support for marshalling
>> between the host and guest environment with the RowCoder. The module I've
>> constructed is mostly useless, but demonstrates the host copying the
>> encoded element into the guest's memory, the guest copying those bytes
>> elsewhere in its linear memory buffer, the guest calling back to the host
>> with the offset and size and the host copying and decoding from the guest's
>> memory.
>>
>> Any thoughts/interest? I'm not sure where I was going with this, since it
>> was mostly just a "wouldn't it be cool if..." on a Monday afternoon, but I
>> can see a few use cases for this.
>>
>> Regards,
>>
>> Steve
>>
>> Steven van Rossum |  Strategic Cloud Engineer |  sjvanrossum@google.com |
>>  (+31) (0)6 21174069 <+31%206%2021174069>
>>
>> *Google Netherlands B.V.*
>>
>>
>>
>>
>>
>> *Reg: Claude Debussylaan 34 15th floor, 1082 MD
>> Amsterdam34198589NETHERLANDSVAT / Tax ID:- 812788515 B01*
>>
>>
>> *If you received this communication by mistake, please don't forward it
>> to anyone else (it may contain confidential or privileged information),
>> please erase all copies of it, including all attachments, and please let
>> the sender know it went to the wrong person. Thanks.*
>>
>> *The above terms reflect a potential business arrangement, are provided
>> solely as a basis for further discussion, and are not intended to be and do
>> not constitute a legally binding obligation. No legally binding obligations
>> will be created, implied, or inferred until an agreement in final form is
>> executed in writing by all parties involved.*
>>
>

Re: Fun with WebAssembly transforms

Posted by Pablo Estrada <pa...@google.com>.
Would you open a pull request for it? Or at least share a branch? : )
Even if we don't want to merge it, it would be great to have a PR as a way
to showcase the work, its usefulness, and receive comments on this thread
once we can see something more specific.

On Tue, Jun 14, 2022 at 3:05 PM Steven van Rossum <sj...@google.com>
wrote:

> Hi folks,
>
> I had some spare time yesterday and thought it'd be fun to implement a
> transform which runs WebAssembly modules as a lightweight way to implement
> cross language transforms for languages which don't (yet) have a SDK
> implementation.
>
> I've got a small proof of concept running in the Python SDK as a DoFn with
> Wasmer as the WebAssembly runtime and simple support for marshalling
> between the host and guest environment with the RowCoder. The module I've
> constructed is mostly useless, but demonstrates the host copying the
> encoded element into the guest's memory, the guest copying those bytes
> elsewhere in its linear memory buffer, the guest calling back to the host
> with the offset and size and the host copying and decoding from the guest's
> memory.
>
> Any thoughts/interest? I'm not sure where I was going with this, since it
> was mostly just a "wouldn't it be cool if..." on a Monday afternoon, but I
> can see a few use cases for this.
>
> Regards,
>
> Steve
>
> Steven van Rossum |  Strategic Cloud Engineer |  sjvanrossum@google.com |
>  (+31) (0)6 21174069 <+31%206%2021174069>
>
> *Google Netherlands B.V.*
>
>
>
>
>
> *Reg: Claude Debussylaan 34 15th floor, 1082 MD
> Amsterdam34198589NETHERLANDSVAT / Tax ID:- 812788515 B01*
>
>
> *If you received this communication by mistake, please don't forward it to
> anyone else (it may contain confidential or privileged information), please
> erase all copies of it, including all attachments, and please let the
> sender know it went to the wrong person. Thanks.*
>
> *The above terms reflect a potential business arrangement, are provided
> solely as a basis for further discussion, and are not intended to be and do
> not constitute a legally binding obligation. No legally binding obligations
> will be created, implied, or inferred until an agreement in final form is
> executed in writing by all parties involved.*
>