You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@devlake.apache.org by GitBox <gi...@apache.org> on 2022/06/08 06:21:23 UTC

[GitHub] [incubator-devlake] klesh opened a new issue, #2117: Partial update support for `api_collector`/`extractor` and `data_converter`

klesh opened a new issue, #2117:
URL: https://github.com/apache/incubator-devlake/issues/2117

   ## Description
   
   In order to support differential data collection for `changelogs`, we have to filter issues by `updated` date to select only those `updated` > `changelog_updated`, by doing so, we collect only a portion of new data instead of the full collection. This reduces the time of collection drastically.  But this approach depends on extra fields on `issues` table, which is not elegant because we have to update this `changelog_updated` whenever issue changelogs are collected successfully, and due to the fact we don't support partial update in `api_extractor`, we have to call `db.Update` directly in `changelog_collector`, which is a side-effect operation makes it less portable.
   
   Issue #1711  try to remove those fields, and calculate the `changelog_updated` dynamic based on `jira_changelogs` table, which lately proven by @mindlesscloud that is unreliable. Since collection/extraction might fail, and lead to some missing-data situations. We agreed that we should keep the original approach until otherwise.
   
   Same problem apply to `remotelinks`/`worklogs`
   
   
   ## Describe the solution you'd like
   1. Close #1711 
   2. `api_collector` / `extractor` and other helpers to support partial update, by introducing a special struct `PartialUpdate`
   3. update jira worklog/remotelink/changelog to use `PartialUpdate` instead of calling `db` directly
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@devlake.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-devlake] klesh commented on issue #2117: Partial update support for `api_collector`/`extractor` and `data_converter`

Posted by GitBox <gi...@apache.org>.
klesh commented on issue #2117:
URL: https://github.com/apache/incubator-devlake/issues/2117#issuecomment-1154946426

   This is a dead end,  records get deleted everytime we run `issue_extractor`, which means `changelog_updated` gets deleted as well.
   Thus changelog differential collection was not working, and will not be working with this approach.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@devlake.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-devlake] klesh commented on issue #2117: Partial update support for `api_collector`/`extractor` and `data_converter`

Posted by GitBox <gi...@apache.org>.
klesh commented on issue #2117:
URL: https://github.com/apache/incubator-devlake/issues/2117#issuecomment-1154949398

   I tried another technique, to compare issue.updated with max(changelog.created):
   ![image](https://user-images.githubusercontent.com/61080/173545555-ee1133b0-e868-413b-87c9-efbdc8a9f541.png)
   
   This almost did the job, except:
   ![image](https://user-images.githubusercontent.com/61080/173545664-b74c9a55-7b79-4c9f-aaf0-f5ebdbd32ee8.png)
   
   That is because of comment would cause `issue.updated` get updated, but it wouldn't generate a `changelog`
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@devlake.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-devlake] klesh closed issue #2117: Partial update support for `api_collector`/`extractor` and `data_converter`

Posted by GitBox <gi...@apache.org>.
klesh closed issue #2117: Partial update support for `api_collector`/`extractor` and `data_converter`
URL: https://github.com/apache/incubator-devlake/issues/2117


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@devlake.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-devlake] klesh commented on issue #2117: Partial update support for `api_collector`/`extractor` and `data_converter`

Posted by GitBox <gi...@apache.org>.
klesh commented on issue #2117:
URL: https://github.com/apache/incubator-devlake/issues/2117#issuecomment-1155231301

   Will be addressed by #2189 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@devlake.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org