You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hop.apache.org by Matt Casters <ma...@neo4j.com.INVALID> on 2021/04/20 07:59:40 UTC
[DISCUSS] Beam transform handler plugins
Shiny Hoppy people!
I've been wrestling with an API / architecture issue that needs some
resolution.
The topic at hand is the Apache Beam integration in the form of our
engines/beam plugin.
Currently, the handling of the various Beam-specific transforms is
hard-coded
<https://github.com/apache/incubator-hop/tree/master/plugins/engines/beam/src/main/java/org/apache/hop/beam/pipeline/handler>
and
I don't like it.
For example, a `Memory Group By` transform will result in the inclusion of
a GroupByKey to be created and applied to a Beam PCollection.
It would be ideal if we could move the code for said 'Memory Group By' Beam
logic to plugins/transforms/memgroupby. However, that would require Apache
Beam dependencies
to be sprinkled over a lot of plugins which I don't like.
Right now we solve the dependency in the dependencies.xml file where we
have something like ../../transforms/memgroupby to drag the required jar
file(s) in.
Here is what I would like to do:
- Create a bunch of extra modules in plugins/engine/beam for memgroupby,
kafka and others.
- Create a new plugin type which gets registered by Apache Beam: a Beam
Transform Handler Plugin Type. Every module would then be dependent on the
Beam parent and would implement a beam transform handler.
- The parent dependencies.xml file will be gone and replaced by a bunch of
one-liners in the sub-modules.
This way someone that is not interested in, say, 'Kafka' can still remove
it from the plugins albeit in 2 plugin folders.
Let me know if you have any objections or better ideas. I've wrecked my
brain for a long time now to find a better way so any help is welcome.
Cheers,
Matt