You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@devlake.apache.org by "keon94 (via GitHub)" <gi...@apache.org> on 2023/06/04 02:47:26 UTC

[GitHub] [incubator-devlake] keon94 opened a new issue, #5360: [Refactor][Framework] Break up Blueprint config in the database

keon94 opened a new issue, #5360:
URL: https://github.com/apache/incubator-devlake/issues/5360

   ## What and why to refactor
   Blueprints are currently stored in a monolithic fashion in the database, with the plan and settings stored as encrypted as json blobs within them. The blob structure makes searching for blueprints by some blob attribute impossible except by querying for all blueprints and brute-force searching through them to filter for the one(s) we're looking for. This is bad for performance obviously.
   
   ## Describe the solution you'd like
   Break the plan and settings up into other tables and link them to the BP table via IDs. We'll have to decide which columns of these tables should be encrypted. At the moment, the main use case will be having at least a blueprint_scopes table, which will be referenced by blueprint_connections, referenced by blueprint_settings. Having these tables will be the minimum requirement of this ticket.  
   
   ## Related issues
   #5153 - introduced brute-force searching for blueprints by scopes. This ticket should optimize that.
   
   ## Additional context
   n/a
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@devlake.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-devlake] keon94 commented on issue #5360: [Refactor][Framework] Break up Blueprint config in the database

Posted by "keon94 (via GitHub)" <gi...@apache.org>.
keon94 commented on issue #5360:
URL: https://github.com/apache/incubator-devlake/issues/5360#issuecomment-1629770311

   @klesh About blueprint_scopes, the blueprint_connection_id would be referring to blueprint_connections.id field not blueprint_connections.connection_id. So I am trying to make the distinction.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@devlake.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-devlake] github-actions[bot] commented on issue #5360: [Refactor][Framework] Break up Blueprint config in the database

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on issue #5360:
URL: https://github.com/apache/incubator-devlake/issues/5360#issuecomment-1732439328

   This issue has been automatically marked as stale because it has been inactive for 60 days. It will be closed in next 7 days if no further activity occurs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@devlake.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-devlake] keon94 commented on issue #5360: [Refactor][Framework] Break up Blueprint config in the database

Posted by "keon94 (via GitHub)" <gi...@apache.org>.
keon94 commented on issue #5360:
URL: https://github.com/apache/incubator-devlake/issues/5360#issuecomment-1583209637

   @klesh Here's what I have in mind (simplified):
   ![image](https://github.com/apache/incubator-devlake/assets/25063936/0d8f378a-c8c8-4799-a911-1ed08397d675)
   
   Not sure versioning would make sense anymore? Because if we are normalizing data across SQL tables, then we automatically take care of that with migrations. I've also kept the Pipeline plan internals untouched (still JSON), but maybe we can think of having them normalized too (probably another PR to keep things simpler).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@devlake.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-devlake] klesh commented on issue #5360: [Refactor][Framework] Break up Blueprint config in the database

Posted by "klesh (via GitHub)" <gi...@apache.org>.
klesh commented on issue #5360:
URL: https://github.com/apache/incubator-devlake/issues/5360#issuecomment-1630033568

   > @klesh `blueprint_id` is technically not needed (not even in connections) but I decided to put it there as a convenience since it's going to be immutable and we'll be often querying based on it (otherwise we'd need to do a bunch of Joins). `plan_id` replaces `plan` in the blueprint table after th refactor. Plans will get their own table. (another ticket)
   
   I see, let's tackle it one by one. `bp.settings` only for now.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@devlake.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-devlake] klesh commented on issue #5360: [Refactor][Framework] Break up Blueprint config in the database

Posted by "klesh (via GitHub)" <gi...@apache.org>.
klesh commented on issue #5360:
URL: https://github.com/apache/incubator-devlake/issues/5360#issuecomment-1579845419

   @keon94 I think this is an important refactor we should do ASAP, would you mind posting your design first?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@devlake.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-devlake] klesh commented on issue #5360: [Refactor][Framework] Break up Blueprint config in the database

Posted by "klesh (via GitHub)" <gi...@apache.org>.
klesh commented on issue #5360:
URL: https://github.com/apache/incubator-devlake/issues/5360#issuecomment-1630006301

   > @klesh About blueprint_scopes, the blueprint_connection_id would be referring to blueprint_connections.id field not blueprint_connections.connection_id. So I am trying to make the distinction.
   
   You are right.
   Do we need `blueprint_scopes.blueprint_id`?  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@devlake.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-devlake] klesh closed issue #5360: [Refactor][Framework] Break up Blueprint config in the database

Posted by "klesh (via GitHub)" <gi...@apache.org>.
klesh closed issue #5360: [Refactor][Framework] Break up Blueprint config in the database
URL: https://github.com/apache/incubator-devlake/issues/5360


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@devlake.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-devlake] klesh commented on issue #5360: [Refactor][Framework] Break up Blueprint config in the database

Posted by "klesh (via GitHub)" <gi...@apache.org>.
klesh commented on issue #5360:
URL: https://github.com/apache/incubator-devlake/issues/5360#issuecomment-1575896533

   Good idea.
   I don't think encryption is necessary for the `settings` field anymore, all sensitive information had been moved to other places already IIRC.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@devlake.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-devlake] klesh commented on issue #5360: [Refactor][Framework] Break up Blueprint config in the database

Posted by "klesh (via GitHub)" <gi...@apache.org>.
klesh commented on issue #5360:
URL: https://github.com/apache/incubator-devlake/issues/5360#issuecomment-1586891177

   @keon94 LGTM overall, some minor suggestion:
   - the `blueprint_connections.settings_id` should be `setting_id` singular.
   - we can drop the `version` column since it makes no sense without `JSON` 's flexibility, and we can migrate those tables in our migration scripts after the refactor.
   - either convention: `blueprint_setting_id` `blueprint_connection_id` or `setting_id` `connection_id` is good for me, but no MIX.
   - we can leave the `plan` out of this refactor and make another one for it. I would prefer we have `pipeline_stages` `pipeline_tasks` alongside `pipelines` as they are the models we used in our code to represent the `pipeline plan`, essentially, a `pipeline plan` is an array of `pipeline stages`, and a `pipeline stage` is an array of `pipeline tasks`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@devlake.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-devlake] keon94 commented on issue #5360: [Refactor][Framework] Break up Blueprint config in the database

Posted by "keon94 (via GitHub)" <gi...@apache.org>.
keon94 commented on issue #5360:
URL: https://github.com/apache/incubator-devlake/issues/5360#issuecomment-1630029938

   @klesh `blueprint_id` is technically not needed (not even in connections) but I decided to put it there as a convenience since it's going to be immutable and we'll be often querying based on it (otherwise we'd need to do a bunch of Joins). 
   `plan_id` replaces `plan` in the blueprint table after th refactor. Plans will get their own table. (another ticket)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@devlake.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org