You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@devlake.apache.org by GitBox <gi...@apache.org> on 2023/01/11 02:44:39 UTC

[GitHub] [incubator-devlake] warren830 opened a new issue, #4188: [Bug][Gitex] Unable to cancel gitextractor when preparing data

warren830 opened a new issue, #4188:
URL: https://github.com/apache/incubator-devlake/issues/4188

   ### Search before asking
   
   - [X] I had searched in the [issues](https://github.com/apache/incubator-devlake/issues?q=is%3Aissue) and found no similar issues.
   
   
   ### What happened
   
   When gitextractor is preparing data, we cannot cancel the pipeline because libgit2 does not support cancel
   
   ### What do you expect to happen
   
   Even if gitextractor is preparing data, we should be able to cancel the whole pipeline
   
   ### How to reproduce
   
   Use normal mode to collect more than 2 github/gitlab repos, and when the pipeline is running, click `cancel`
   
   ### Anything else
   
   _No response_
   
   ### Version
   
   main
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@devlake.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-devlake] d4x1 commented on issue #4188: [Bug][Gitex] Unable to cancel gitextractor when preparing data

Posted by "d4x1 (via GitHub)" <gi...@apache.org>.
d4x1 commented on issue #4188:
URL: https://github.com/apache/incubator-devlake/issues/4188#issuecomment-1468050571

   Maybe just leave `gitextractor` running, and ignore its output ?  Can we do like this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@devlake.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-devlake] klesh commented on issue #4188: [Bug][Gitex] Unable to cancel gitextractor when preparing data

Posted by "klesh (via GitHub)" <gi...@apache.org>.
klesh commented on issue #4188:
URL: https://github.com/apache/incubator-devlake/issues/4188#issuecomment-1619865940

   @stultus was it stuck at the clone stage? if that was true, the only solution for now is to restart the `devlake` container. 🥲


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@devlake.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-devlake] klesh closed issue #4188: [Bug][Gitex] Unable to cancel gitextractor when preparing data

Posted by "klesh (via GitHub)" <gi...@apache.org>.
klesh closed issue #4188: [Bug][Gitex] Unable to cancel gitextractor when preparing data 
URL: https://github.com/apache/incubator-devlake/issues/4188


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@devlake.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-devlake] klesh commented on issue #4188: [Bug][Gitex] Unable to cancel gitextractor when preparing data

Posted by "klesh (via GitHub)" <gi...@apache.org>.
klesh commented on issue #4188:
URL: https://github.com/apache/incubator-devlake/issues/4188#issuecomment-1482363500

   @d4x1 I want to express my gratitude for the confirmation - it is very thoughtful and much appreciated! ❤️


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@devlake.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-devlake] klesh commented on issue #4188: [Bug][Gitex] Unable to cancel gitextractor when preparing data

Posted by "klesh (via GitHub)" <gi...@apache.org>.
klesh commented on issue #4188:
URL: https://github.com/apache/incubator-devlake/issues/4188#issuecomment-1482358701

   Well, it has been a while since we tested it.
   Maybe we should do another round, there were two things that impacted the performance IIRC:
   
   1. Iterates all commits
   2. Calculates diff between 2 commits.
   
   Can we write Benchmark PoC with those actions covered to test on a couple of medium size repositories, like `clickhouse` and `pingcap`. So we can assess how slow it will be in the real-world.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@devlake.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-devlake] d4x1 commented on issue #4188: [Bug][Gitex] Unable to cancel gitextractor when preparing data

Posted by "d4x1 (via GitHub)" <gi...@apache.org>.
d4x1 commented on issue #4188:
URL: https://github.com/apache/incubator-devlake/issues/4188#issuecomment-1484886278

   Got it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@devlake.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-devlake] d4x1 commented on issue #4188: [Bug][Gitex] Unable to cancel gitextractor when preparing data

Posted by "d4x1 (via GitHub)" <gi...@apache.org>.
d4x1 commented on issue #4188:
URL: https://github.com/apache/incubator-devlake/issues/4188#issuecomment-1481491307

   > @d4x1 `git2go` was chosen because it has more features and is faster than `gogit`. I don't think it is urgent. Feel free to do so, appreciated it ❤️
   
   I find a comment here, https://github.com/apache/incubator-devlake/blob/main/backend/plugins/gitextractor/parser/clone.go#L33
   
   replacing git2go with gogit will lose some performance, is this acceptable ?
   
   Before doing it, I think we should come to an agreement.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@devlake.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-devlake] github-actions[bot] commented on issue #4188: [Bug][Gitex] Unable to cancel gitextractor when preparing data

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on issue #4188:
URL: https://github.com/apache/incubator-devlake/issues/4188#issuecomment-1550514774

   This issue has been automatically marked as stale because it has not had recent activity for 30 days. It will be closed in next 7 days if no further activity occurs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@devlake.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-devlake] github-actions[bot] commented on issue #4188: [Bug][Gitex] Unable to cancel gitextractor when preparing data

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on issue #4188:
URL: https://github.com/apache/incubator-devlake/issues/4188#issuecomment-1599764597

   This issue has been automatically marked as stale because it has not had recent activity for 30 days. It will be closed in next 7 days if no further activity occurs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@devlake.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-devlake] d4x1 commented on issue #4188: [Bug][Gitex] Unable to cancel gitextractor when preparing data

Posted by "d4x1 (via GitHub)" <gi...@apache.org>.
d4x1 commented on issue #4188:
URL: https://github.com/apache/incubator-devlake/issues/4188#issuecomment-1607967315

   Keep this issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@devlake.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-devlake] klesh commented on issue #4188: [Bug][Gitex] Unable to cancel gitextractor when preparing data

Posted by "klesh (via GitHub)" <gi...@apache.org>.
klesh commented on issue #4188:
URL: https://github.com/apache/incubator-devlake/issues/4188#issuecomment-1469554730

   @d4x1 `git2go` was chosen because it has more features and is faster than `gogit`.
   I don't think it is urgent.  Feel free to do so, appreciated it ❤️


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@devlake.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-devlake] d4x1 commented on issue #4188: [Bug][Gitex] Unable to cancel gitextractor when preparing data

Posted by "d4x1 (via GitHub)" <gi...@apache.org>.
d4x1 commented on issue #4188:
URL: https://github.com/apache/incubator-devlake/issues/4188#issuecomment-1551671678

   I am still working on it and it will be updated.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@devlake.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-devlake] klesh commented on issue #4188: [Bug][Gitex] Unable to cancel gitextractor when preparing data

Posted by "klesh (via GitHub)" <gi...@apache.org>.
klesh commented on issue #4188:
URL: https://github.com/apache/incubator-devlake/issues/4188#issuecomment-1484715170

   > > Well, it has been a while since we tested it. Maybe we should do another round, there were two things that impacted the performance IIRC:
   > > ```
   > > 1. Iterates all commits
   > > 
   > > 2. Calculates diff between 2 commits.
   > > ```
   > > 
   > > 
   > >     
   > >       
   > >     
   > > 
   > >       
   > >     
   > > 
   > >     
   > >   
   > > Can we write Benchmark PoC with those actions covered to test on a couple of medium size repositories, like `clickhouse` and `pingcap`. So we can assess how slow it will be in the real-world.
   > 
   > Benchmark is necessary. +1 But I am doubt about whether pingcap or clikckhouse is appropriate. Does devlake team has any statistics about the pct90 size of repos it will analyse? I think this data varies greatly from scenario to scenario. But I have no idea on how to get a appropriate test repository.
   
   Nah, not really. There are not enough resources to conduct such a massive investigation and gitext has never been a bottleneck compared to other departments, such as collecting data from API, Chart loading speed, etc.
   
   I would say randomly picking a couple of famous and long-lived open-source projects is sufficient at this point.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@devlake.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-devlake] klesh commented on issue #4188: [Bug][Gitex] Unable to cancel gitextractor when preparing data

Posted by "klesh (via GitHub)" <gi...@apache.org>.
klesh commented on issue #4188:
URL: https://github.com/apache/incubator-devlake/issues/4188#issuecomment-1469492253

   > Maybe just leave `gitextractor` running, and ignore its output ? Can we do like this?
   
   It is one way to go... however, It is not reliable, there is a chance that too many Git Clone operations are stuck in the background and eating out the system file descriptors.
   
   I prefer the following options:
   - Get rid of the `git2go` and use the `gogit` only which supports `context`. `git2go` has always been problematic since it depends on the `libgit2` C library. However, doing so would require a huge rewrite.
   - Find a way for `git2go` to support cancellation, I found interesting discussion of the topic, it might be possible to add cancellation support for `git2go`
     - [git - libgit2 - clone operation with Github - Stack Overflow](https://stackoverflow.com/questions/25207194/libgit2-clone-operation-with-github)
     - [Error when cancel a Repository.Clone · Issue #780 · libgit2/libgit2sharp](https://github.com/libgit2/libgit2sharp/issues/780)
   
    
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@devlake.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-devlake] d4x1 commented on issue #4188: [Bug][Gitex] Unable to cancel gitextractor when preparing data

Posted by "d4x1 (via GitHub)" <gi...@apache.org>.
d4x1 commented on issue #4188:
URL: https://github.com/apache/incubator-devlake/issues/4188#issuecomment-1469520105

   I find `gogit` has been used when clone a repo. I don't know why don't choose it as default at the beginning. Are there some special reasons ?
   
   By comparing the stars and activity(commit history) of `gogit` and `git2go` roughly, gogit is a better choice. I can help to rewrite this part if it not urgent.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@devlake.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-devlake] stultus commented on issue #4188: [Bug][Gitex] Unable to cancel gitextractor when preparing data

Posted by "stultus (via GitHub)" <gi...@apache.org>.
stultus commented on issue #4188:
URL: https://github.com/apache/incubator-devlake/issues/4188#issuecomment-1619860229

   One of my blueprint is stuck at the gitextractor section for the past 5 days.  Is there any way to proceed to next step? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@devlake.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-devlake] d4x1 commented on issue #4188: [Bug][Gitex] Unable to cancel gitextractor when preparing data

Posted by "d4x1 (via GitHub)" <gi...@apache.org>.
d4x1 commented on issue #4188:
URL: https://github.com/apache/incubator-devlake/issues/4188#issuecomment-1482738351

   > Well, it has been a while since we tested it. Maybe we should do another round, there were two things that impacted the performance IIRC:
   > 
   >     1. Iterates all commits
   > 
   >     2. Calculates diff between 2 commits.
   > 
   > 
   > Can we write Benchmark PoC with those actions covered to test on a couple of medium size repositories, like `clickhouse` and `pingcap`. So we can assess how slow it will be in the real-world.
   
   Benchmark is necessary. +1
   But I am doubt about whether pingcat or clikckhouse is appropriate. Does devlake team has any statistics about the pct90 size of repos it will analyse?
   I think this data varies greatly from scenario to scenario. But I have no idea on how to get a appropriate test repository.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@devlake.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-devlake] github-actions[bot] commented on issue #4188: [Bug][Gitex] Unable to cancel gitextractor when preparing data

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on issue #4188:
URL: https://github.com/apache/incubator-devlake/issues/4188#issuecomment-1666297062

   This issue has been automatically marked as stale because it has not had recent activity for 30 days. It will be closed in next 7 days if no further activity occurs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@devlake.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-devlake] github-actions[bot] commented on issue #4188: [Bug][Gitex] Unable to cancel gitextractor when preparing data

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on issue #4188:
URL: https://github.com/apache/incubator-devlake/issues/4188#issuecomment-1426518015

   This issue has been automatically marked as stale because it has not had recent activity for 30 days. It will be closed in next 7 days if no further activity occurs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@devlake.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org