You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hop.apache.org by Matt Casters <ma...@neo4j.com.INVALID> on 2021/04/20 07:59:40 UTC

[DISCUSS] Beam transform handler plugins

Shiny Hoppy people!

I've been wrestling with an API / architecture issue that needs some
resolution.
The topic at hand is the Apache Beam integration in the form of our
engines/beam plugin.

Currently, the handling of the various Beam-specific transforms is
hard-coded
<https://github.com/apache/incubator-hop/tree/master/plugins/engines/beam/src/main/java/org/apache/hop/beam/pipeline/handler>
and
I don't like it.
For example, a `Memory Group By` transform will result in the inclusion of
a GroupByKey to be created and applied to a Beam PCollection.

It would be ideal if we could move the code for said 'Memory Group By' Beam
logic to plugins/transforms/memgroupby.  However, that would require Apache
Beam dependencies
to be sprinkled over a lot of plugins which I don't like.

Right now we solve the dependency in the dependencies.xml file where we
have something like ../../transforms/memgroupby to drag the required jar
file(s) in.

Here is what I would like to do:
- Create a bunch of extra modules in plugins/engine/beam for memgroupby,
kafka and others.
- Create a new plugin type which gets registered by Apache Beam: a Beam
Transform Handler Plugin Type.  Every module would then be dependent on the
Beam parent and would implement a beam transform handler.
- The parent dependencies.xml file will be gone and replaced by a bunch of
one-liners in the sub-modules.

This way someone that is not interested in, say, 'Kafka' can still remove
it from the plugins albeit in 2 plugin folders.

Let me know if you have any objections or better ideas. I've wrecked my
brain for a long time now to find a better way so any help is welcome.

Cheers,
Matt