You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@streampipes.apache.org by GitBox <gi...@apache.org> on 2023/01/05 10:49:53 UTC

[GitHub] [streampipes] tenthe created a discussion: StreamPipes Structure

GitHub user tenthe created a discussion: StreamPipes Structure

# Background
This discussion is about the **structure and design** of **StreamPipes** and its **APIs**. 
Recently we have been working on some issues to reformat the code. During this work, I realized that some concepts within the software are not entirely clear and that we also have several legacy modules that can probably be removed. I wasn't quite sure where to start, so I took some time to get an overview of the current implementation.

# Goal
I would like to look at the entire project and discuss with you the code structure, APIs, libraries, and third-party services (e.g. databases, message brokers). 
**The goal is to clean up the code base and create software that is easy to develop, easy to use, and easy to maintain.**

# Overview
Here is a first **overview of the different roles, StreamPipes services and other dependencies**. 
This by no means claims to be complete. Rather, it is intended to serve as a starting point for discussion.
<img width="759" alt="grafik" src="https://user-images.githubusercontent.com/5279561/210762764-4f84d017-de36-4293-b3eb-8f772cbf0ecc.png">

# Next Steps
- I started with an overview of the current Java modules, see post below.
- Additionally, we should create a similar overview for the REST APIs. I think they are a good starting point for restructuring and refactoring.
- The artifacts (e.g., architecture diagrams) and descriptions that emerge from the discussion should be integrated into the documentation to help others quickly understand the design decisions

# Contribution
🎯 This discussion should only be the starting point.
👏 I think we have already made great progress in the last few months. Thank you to everyone who has participated! 
🗣️ Any kind of feedback and discussion is highly welcome! 

GitHub link: https://github.com/apache/streampipes/discussions/1038

----
This is an automatically sent email for dev@streampipes.apache.org.
To unsubscribe, please send an email to: dev-unsubscribe@streampipes.apache.org


[GitHub] [streampipes] bossenti added a comment to the discussion: StreamPipes Structure

Posted by GitBox <gi...@apache.org>.
GitHub user bossenti added a comment to the discussion: StreamPipes Structure

I really like your suggestion, let's do it step by step 💪🏼 

GitHub link: https://github.com/apache/streampipes/discussions/1038#discussioncomment-4843442

----
This is an automatically sent email for dev@streampipes.apache.org.
To unsubscribe, please send an email to: dev-unsubscribe@streampipes.apache.org


[GitHub] [streampipes] tenthe added a comment to the discussion: StreamPipes Structure

Posted by GitBox <gi...@apache.org>.
GitHub user tenthe added a comment to the discussion: StreamPipes Structure

# Overview StreamPipes Modules
I have looked at all the modules we have in our Java codebase and tried to summarize their main intent. I also added some discussion points that I noticed.

- **client** / **client-python**
	- Clients to interact with the StreamPipes API
- **commons**
    - Constants, Exceptions, Networking, Parser, Random(Generators), ZIP
- **config**
    - Configurations that can be changed by users, as well as configuratioins for the core service
- **connect-management**
	- Contains the adapter management, that is required in the service extensions during runtime
    - `DISCUSS` Should we rename the classes?
- **data-explorer**
	-  Application logic to query data from the time-series database
    - `DISCUSS` Should we change the module name?
    - `DISCUSS` Which classes should be moved to the **REST modules**
    - `DISCUSS` In this module we have a strong coupling to the InfluxDB
- **data-explorer-commons**
    -  Configurations, InfluxStore, ImageStore, TimeSeriesStore
    - `DISCUSS`  Can we move the code to module **data-explorer**
- **data-export**
	- Responsible to manage the export / import of StreamPipes configurations
    - `DISCUSS` Think about a different name. Can be confused with export of data explorer data.
- **dataformat**
    - Those modules contain all the implementation for the data formats
    - (cbor, fst, json, smile)
- **extensions**
    - Contains all modules with adapters, processors, and sinks
- **extensions-api**
    - Interfaces for the extensions (See module below)
- **extensions-management**
    - Application logic relevant for the modules that are implemented in **extensions**
- **integration-tests**
    - Contain tests for services that have third party requirements
- **logging**
	- `DISCUSS` Do we still need this module or is it legacy code
- **mail**
    - Logic to send emails via the system
- **maven-plugin**
    - Maven plugin to extract pipeline element information from code and provide it for the website documentation
- **measurement-units**
    - logic to deal with different units
- **messaging**
    - Different implementation for internally used message brokers
    - (jms, kafka, mqtt, nats)
- **model**
    - Contains all model classes of StreamPipes
    - This model is also used to automatically generate the model for TypeScript
- **model-client**
    - Model classes that are only required by the UI
    - This model is also used to automatically generate the model for TypeScript
- **model-shared**
    - Only contains annotations `TsIgnore` & `TsModel`
- **performance-tests**
    - `DISCUSS` Do we still need this module or is it legacy code
- **pipeline-management**
	- Contains application logic for the pipeline editor
- **platform-service**
    - Currently contains Resources for the data lake
    - `DISCUSS` Move those classes into other modules. How will we structure the REST modules?
- **resource-management**
    - `DISCUSS` What is managed in this module and what could be moved to `pipeline-management` or `connect-management`
- **rest**
    - The `impl` package currenlty contains the most REST endpoints
    - **rest-core-base**
        - Only contains two classes `AbstractAuthGuardedRestResource` & `AbstactRestResource`
    - **rest-extensions**
        - Rest enpoints of **service extensions**
        - Currently mainly endpoints to manage connect & pipelines
    - **rest-shared**
        - (annotation, impl, serializer, util )
- **sdk**
	- builder, extractor, helpers, 
    - Application logic to extend StreamPipes functionalities
- **sdk-bundle**
    - Contains no classes, is only used to budle several packages
    - `DISCUSS` Do we need this or can we move this to  **sdk** instead
- **security-jwt**
    - Logic to handle json web tokens
- **serializers-json**
    - Contains one class, the `JacksonSerializer`
    - `DISCUSS` Do we require a seperate module for this?
- **service-base**
    - Base module for StreamPipes services (**service core** and **service extension**)
- **service-core**
    - Is executed when StreamPipes is started
    - Contains migration scripts and starts up REST endpooint
- **service-discovery**
    - Only contains class `SpServiceDiscovery`
	- **service-discovery-api**
		- Interface of the service discovery 
	- **service-discovery-consul**
		- Consul implementation of the service discovery interface
- **service-extensions**
    - Module for extension services containing (adapters, processing elements, sinks, functions) during runtime
    - Contains `ExtensionsModelSubmitter` that is implemented by all extension services
- **sources**
    - Abstract classes `AbstractAdapterIncludedStream` & `AbstractAlreadyExistingStream`
    - Required for streams that are not created by connect adapters
    - `DISCUSS` Should we continue to support this or only rely on the connect API?
- **storage-api**
    - Interfaces of all classes that can be persistet in the database
- **storage-couchdb**
    - CouchDB implementation of the **storage-api**
- **storage-management**
    - Contains two classes `StorageDispatcher` & `StorageManager`
- **test-utils**
	- Contains utility classes to ease the creation of unit tests
- **user-management**
	- Responsible for user management within StreamPipes
	- Packages: authentication, encryption, jwt, model, service
- **vocabulary**
    - Contains classes with different vocabularies
- **wrapper**
    - Base module for the different wrappers
    - **wrapper-distributed**
        - Base module for the distributed wrappers
		- Contains one abstract class `DistributedRuntime`
    - **wrapper-flink**
        - Wrapper for Flink
    - **wrapper-kafka-streams**
        - Wrapper for Kafka Streams
    - **wrapper-python**
        - Python wrapper
        - `DISCUSS` Is this module still relevant for the new Python integration?
    - **wrapper-siddhi**
        - Wrapper for Siddhi engine
    - **wrapper-standalone**
        - Module for standalone wrappers that are not distributed

GitHub link: https://github.com/apache/streampipes/discussions/1038#discussioncomment-4600352

----
This is an automatically sent email for dev@streampipes.apache.org.
To unsubscribe, please send an email to: dev-unsubscribe@streampipes.apache.org


[GitHub] [streampipes] tenthe added a comment to the discussion: StreamPipes Structure

Posted by GitBox <gi...@apache.org>.
GitHub user tenthe added a comment to the discussion: StreamPipes Structure

Maybe we can work through the items one by one. My suggestion would be to first have a short discussion about it and then make an issue out of it. 

What do you think about that? 
There should also be a lot of topics that are good first issues once it is clear what to do.

I'm afraid that if we directly make issue out of everything, things often aren't quite clear yet and they are not worked on for a long time.

GitHub link: https://github.com/apache/streampipes/discussions/1038#discussioncomment-4826686

----
This is an automatically sent email for dev@streampipes.apache.org.
To unsubscribe, please send an email to: dev-unsubscribe@streampipes.apache.org


[GitHub] [streampipes] tenthe added a comment to the discussion: StreamPipes Structure

Posted by GitBox <gi...@apache.org>.
GitHub user tenthe added a comment to the discussion: StreamPipes Structure

I am currently working on replacing the data set adapters as described in #1115.
With this change we will hopefully get rid of some dependencies and simplify the code structure.

Aditionally, my suggestion would be to start with the modules/discussions that I wasn't sure we still needed. Maybe we can delete or restructure some of them to reduce the number of modules. This shouldn't take much time.
What do you think about this?

Translated with www.DeepL.com/Translator (free version)


GitHub link: https://github.com/apache/streampipes/discussions/1038#discussioncomment-4794644

----
This is an automatically sent email for dev@streampipes.apache.org.
To unsubscribe, please send an email to: dev-unsubscribe@streampipes.apache.org


[GitHub] [streampipes] bossenti edited a comment on the discussion: StreamPipes Structure

Posted by GitBox <gi...@apache.org>.
GitHub user bossenti edited a comment on the discussion: StreamPipes Structure

@tenthe thanks a lot for starting the discussion and preparing it that well 🙏🏼 
So where and how do we want to start?
Do we want to have a dedicated disussion for each `DISCUSS` entry? Or just start a thread for each within this discussion?

I think a great outcome of the conceptual work would we to have the overciew you provided somewhere clearly documented

GitHub link: https://github.com/apache/streampipes/discussions/1038#discussioncomment-4790612

----
This is an automatically sent email for dev@streampipes.apache.org.
To unsubscribe, please send an email to: dev-unsubscribe@streampipes.apache.org


[GitHub] [streampipes] bossenti added a comment to the discussion: StreamPipes Structure

Posted by GitBox <gi...@apache.org>.
GitHub user bossenti added a comment to the discussion: StreamPipes Structure

Sounds like a solid first step 👍🏼 
Do we want to create (a) issue(s) for that?

GitHub link: https://github.com/apache/streampipes/discussions/1038#discussioncomment-4822480

----
This is an automatically sent email for dev@streampipes.apache.org.
To unsubscribe, please send an email to: dev-unsubscribe@streampipes.apache.org


[GitHub] [streampipes] tenthe edited a comment on the discussion: StreamPipes Structure

Posted by GitBox <gi...@apache.org>.
GitHub user tenthe edited a comment on the discussion: StreamPipes Structure

I am currently working on replacing the data set adapters as described in #1115.
With this change we will hopefully get rid of some dependencies and simplify the code structure.

Aditionally, my suggestion would be to start with the modules/discussions that I wasn't sure we still needed. Maybe we can delete or restructure some of them to reduce the number of modules. This shouldn't take much time.
What do you think about this?


GitHub link: https://github.com/apache/streampipes/discussions/1038#discussioncomment-4794644

----
This is an automatically sent email for dev@streampipes.apache.org.
To unsubscribe, please send an email to: dev-unsubscribe@streampipes.apache.org


[GitHub] [streampipes] bossenti added a comment to the discussion: StreamPipes Structure

Posted by GitBox <gi...@apache.org>.
GitHub user bossenti added a comment to the discussion: StreamPipes Structure

@tenthe thanks a lot for starting the discussion and preparing it that well 🙏🏼 
So where and how do we want to start?
Do we want to have a dedicated disussion for each `DISCUSS` entry here? Or just start a thread for each within this discussion?

I think a great outcome of the conceptual work would we to have the overciew you provided somewhere clearly documented

GitHub link: https://github.com/apache/streampipes/discussions/1038#discussioncomment-4790612

----
This is an automatically sent email for dev@streampipes.apache.org.
To unsubscribe, please send an email to: dev-unsubscribe@streampipes.apache.org


[GitHub] [streampipes] tenthe edited a comment on the discussion: StreamPipes Structure

Posted by GitBox <gi...@apache.org>.
GitHub user tenthe edited a comment on the discussion: StreamPipes Structure

# Overview StreamPipes Modules
I have looked at all the modules we have in our Java codebase and tried to summarize their main intent. I also added some discussion points that I noticed.

- **client** / **client-python**
	- Clients to interact with the StreamPipes API
- **commons**
    - Constants, Exceptions, Networking, Parser, Random(Generators), ZIP
- **config**
    - Configurations that can be changed by users, as well as configuratioins for the core service
- **connect-management**
	- Contains the adapter management, that is required in the service extensions during runtime
    - `DISCUSS` Should we rename the classes?
- **data-explorer**
	-  Application logic to query data from the time-series database
    - `DISCUSS` Should we change the module name?
    - `DISCUSS` Which classes should be moved to the **REST modules**
    - `DISCUSS` In this module we have a strong coupling to the InfluxDB
- **data-explorer-commons**
    -  Configurations, InfluxStore, ImageStore, TimeSeriesStore
    - `DISCUSS`  Can we move the code to module **data-explorer**
- **data-export**
	- Responsible to manage the export / import of StreamPipes configurations
    - `DISCUSS` Think about a different name. Can be confused with export of data explorer data.
- **dataformat**
    - Those modules contain all the implementation for the data formats
    - (cbor, fst, json, smile)
- **extensions**
    - Contains all modules with adapters, processors, and sinks
- **extensions-api**
    - Interfaces for the extensions (See module below)
- **extensions-management**
    - Application logic relevant for the modules that are implemented in **extensions**
- **integration-tests**
    - Contain tests for services that have third party requirements
- **logging**
	- `DISCUSS` Do we still need this module or is it legacy code
- **mail**
    - Logic to send emails via the system
- **maven-plugin**
    - Maven plugin to extract pipeline element information from code and provide it for the website documentation
- **measurement-units**
    - logic to deal with different units
- **messaging**
    - Different implementation for internally used message brokers
    - (jms, kafka, mqtt, nats)
- **model**
    - Contains all model classes of StreamPipes
    - This model is also used to automatically generate the model for TypeScript
- **model-client**
    - Model classes that are only required by the UI
    - This model is also used to automatically generate the model for TypeScript
- **model-shared**
    - Only contains annotations `TsIgnore` & `TsModel`
- performance-tests `REMOVED`
- **pipeline-management**
	- Contains application logic for the pipeline editor
- **platform-service**
    - Currently contains Resources for the data lake
    - `DISCUSS` Move those classes into other modules. How will we structure the REST modules?
- **resource-management**
    - `DISCUSS` What is managed in this module and what could be moved to `pipeline-management` or `connect-management`
- **rest**
    - The `impl` package currenlty contains the most REST endpoints
    - **rest-core-base**
        - Only contains two classes `AbstractAuthGuardedRestResource` & `AbstactRestResource`
    - **rest-extensions**
        - Rest enpoints of **service extensions**
        - Currently mainly endpoints to manage connect & pipelines
    - **rest-shared**
        - (annotation, impl, serializer, util )
- **sdk**
	- builder, extractor, helpers, 
    - Application logic to extend StreamPipes functionalities
- **sdk-bundle**
    - Contains no classes, is only used to budle several packages
    - `DISCUSS` Do we need this or can we move this to  **sdk** instead
- **security-jwt**
    - Logic to handle json web tokens
- **serializers-json**
    - Contains one class, the `JacksonSerializer`
    - `DISCUSS` Do we require a seperate module for this?
- **service-base**
    - Base module for StreamPipes services (**service core** and **service extension**)
- **service-core**
    - Is executed when StreamPipes is started
    - Contains migration scripts and starts up REST endpooint
- **service-discovery**
    - Only contains class `SpServiceDiscovery`
	- **service-discovery-api**
		- Interface of the service discovery 
	- **service-discovery-consul**
		- Consul implementation of the service discovery interface
- **service-extensions**
    - Module for extension services containing (adapters, processing elements, sinks, functions) during runtime
    - Contains `ExtensionsModelSubmitter` that is implemented by all extension services
- **sources**
    - Abstract classes `AbstractAdapterIncludedStream` & `AbstractAlreadyExistingStream`
    - Required for streams that are not created by connect adapters
    - `DISCUSS` Should we continue to support this or only rely on the connect API?
- **storage-api**
    - Interfaces of all classes that can be persistet in the database
- **storage-couchdb**
    - CouchDB implementation of the **storage-api**
- **storage-management**
    - Contains two classes `StorageDispatcher` & `StorageManager`
- **test-utils**
	- Contains utility classes to ease the creation of unit tests
- **user-management**
	- Responsible for user management within StreamPipes
	- Packages: authentication, encryption, jwt, model, service
- **vocabulary**
    - Contains classes with different vocabularies
- **wrapper**
    - Base module for the different wrappers
    - **wrapper-distributed**
        - Base module for the distributed wrappers
		- Contains one abstract class `DistributedRuntime`
    - **wrapper-flink**
        - Wrapper for Flink
    - **wrapper-kafka-streams**
        - Wrapper for Kafka Streams
    - **wrapper-python**
        - Python wrapper
        - `DISCUSS` Is this module still relevant for the new Python integration?
    - **wrapper-siddhi**
        - Wrapper for Siddhi engine
    - **wrapper-standalone**
        - Module for standalone wrappers that are not distributed

GitHub link: https://github.com/apache/streampipes/discussions/1038#discussioncomment-4600352

----
This is an automatically sent email for dev@streampipes.apache.org.
To unsubscribe, please send an email to: dev-unsubscribe@streampipes.apache.org